[ 
https://issues.apache.org/jira/browse/IMPALA-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416467#comment-17416467
 ] 

ASF subversion and git services commented on IMPALA-10901:
----------------------------------------------------------

Commit c925807b1a3eacf1ca3d0e78f6a0fa043b350174 in impala's branch 
refs/heads/master from Alexander Saydakov
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c925807 ]

IMPALA-10901 cleaner and faster operations with datasketches

- serialize using bytes instead of stream
- avoid unnecessary constructor during deserialization
- simplified code slightly
- added original exception message to re-thrown generic message

Change-Id: I306a2489dac0f4d2d475e8f9987cd58bf95474bb
Reviewed-on: http://gerrit.cloudera.org:8080/17818
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Clean up Datasketches serialization and deserialization
> -------------------------------------------------------
>
>                 Key: IMPALA-10901
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10901
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 4.0.0
>            Reporter: Gabor Kaszab
>            Priority: Major
>              Labels: datasketches
>
> (copy-paste from a mail thread)
> Regarding serialization using bytes as opposed to a stream. This has nothing 
> to do with BINARY data type in Impala.
> Currently I see in the Impala code something like this (simplified):
> std::stringstream tmp;
> sketch.serialize(tmp);
> std::string str = tmp.str(); // in StringStreamToStringVal
> StringVal result(context, str.size());
> memcpy(result.ptr, str.c_str(), str.size());
> You could do it faster like this:
> auto bytes = sketch.serialize();
> StringVal result(context, bytes.size());
> memcpy(result.ptr, bytes.data() bytes.size());
> Regarding unnecessary constructor during deserialization. I see a code like 
> this (HLL is an example, but the pattern is the same):
> datasketches::hll_sketch src_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); // 
> construct an empty sketch, which is not needed
> DeserializeDsSketch(src, &src_sketch); // pass it into a function, which will 
> replace it by an assignment (hopefully a move, not copy)
> // in the function
> *sketch = T::deserialize((void*)serialized_sketch.ptr, serialized_sketch.len);
> This can be accomplished like so avoiding unnecessary constructor:
> datasketches::hll_sketch src_sketch = 
> datasketches::hll_sketch::deserialize((void*)serialized_sketch.ptr, 
> serialized_sketch.len);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to