[ 
https://issues.apache.org/jira/browse/FLINK-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071491#comment-14071491
 ] 

Aljoscha Krettek edited comment on FLINK-987 at 7/23/14 8:24 AM:
-----------------------------------------------------------------

So, what do you think of the interface?

I also ran some tests to see what the overhead of using the seeking feature is. 
For this I added a new String serializer that does some fake seeking (5 tell() 
and 9 seek() calls) to simulate writing a header which would be the prevalent 
use case. The test is writing 100000 random strings to an in-memory paging 
output view with a segment size of 8000. It is repeated 10 times and the 
runtimes are added. For the non-seeking string I get 22000 msecs runtime for 
the seeking string I get 23000 msecs. What other testing would you propose, 
[~StephanEwen]?


was (Author: aljoscha):
So, what do you think of the interface?

I also ran some tests to see what the overhead of using the seeking feature is. 
For this I added a new String serializer that does some fake seeking to 
simulate writing a header which would be the prevalent use case. The test is 
writing 100000 random strings to an in-memory paging output view with a segment 
size of 8000. It is repeated 10 times and the runtimes are added. For the 
non-seeking string I get 22000 msecs runtime for the seeking string I get 23000 
msecs. What other testing would you propose, [~StephanEwen]?

> Extend TypeSerializers and -Comparators to work directly on Memory Segments
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-987
>                 URL: https://issues.apache.org/jira/browse/FLINK-987
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>    Affects Versions: 0.6-incubating
>            Reporter: Stephan Ewen
>            Assignee: Aljoscha Krettek
>             Fix For: 0.6-incubating
>
>
> As per discussion with [~till.rohrmann], [~uce], [~aljoscha], we suggest to 
> change the way that the TypeSerialzers/Comparators and 
> DataInputViews/DataOutputViews work.
> The goal is to allow more flexibility in the construction on the binary 
> representation of data types, and to allow partial deserialization of 
> individual fields. Both is currently prohibited by the fact that the 
> abstraction of the memory (into which the data goes) is a stream abstraction 
> ({{DataInputView}}, {{DataOutputView}}).
> An idea is to offer a random-access buffer like view for construction and 
> random-access deserialization, as well as various methods to copy elements in 
> a binary fashion between such buffers and streams.
> A possible set of methods for the {{TypeSerializer}} could be:
> {code}
> long serialize(T record, TargetBuffer buffer);
>       
> T deserialize(T reuse, SourceBuffer source);
>       
> void ensureBufferSufficientlyFilled(SourceBuffer source);
>       
> <X> X deserializeField(X reuse, int logicalPos, SourceBuffer buffer);
>       
> int getOffsetForField(int logicalPos, int offset, SourceBuffer buffer);
>       
> void copy(DataInputView in, TargetBuffer buffer);
>       
> void copy(SourceBuffer buffer,, DataOutputView out);
>       
> void copy(DataInputView source, DataOutputView target);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to