[ 
https://issues.apache.org/jira/browse/IMPALA-12111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720013#comment-17720013
 ] 

ASF subversion and git services commented on IMPALA-12111:
----------------------------------------------------------

Commit 4d9f50eb74208fa21c70964b0015209bb3f973a8 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4d9f50eb7 ]

IMPALA-12111: Speed up DATE to STRING conversion

Before this patch DATE to STRING conversion seemed slow in
general (slower than TIMESTAMP to STRING) which was visible
especially on the coordinator where result DATEs are returned
as STRINGs in HS2/Beeswax and the conversion happens on a
single thread.

The main cause seems to be using std::stringstream in
DateValue::ToString(). The patch switches to using
impala::TimestampParser::Format() similarly to TimestampValue.

HS2 result set generation is also changed to avoid using
stringstream for TIMESTAMP/DATE and call ToString() directly.

Benchmarks:
- Added benchmark that shows ~4x impovement for DateValue.ToString().
- Manually tested EE scanerio: RowMaterializationTimer dropped from
  1.7s to 0.6s in
  impala-shell -B -q "select cast(l_shipdate as date) from 
tpch_parquet.lineitem;" > /dev/null
  (note that the query above converts STRING to DATE first and then
   from DATE to std::string before returning it in HS2 - the
   improvement comes from the second conversion)

Testing:
- ran core tests

Change-Id: I9a233ae92b1461fc5c47d8345667e36c2632f4c4
Reviewed-on: http://gerrit.cloudera.org:8080/19829
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Speed up DATE to STRING conversion
> ----------------------------------
>
>                 Key: IMPALA-12111
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12111
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: be
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> Converting DATE to STRING is slower than converting TIMESTAMP to STRING which 
> is weird as TIMESTAMP handling should need more work. The culprit seems to be 
> using stringstream in DateValue.ToString(): 
> https://github.com/apache/impala/blob/14698c8b99b80db7e6fd99900e32b6742bef1662/be/src/runtime/date-value.cc#L433
> As dates are returned as string to the client this also slow down the 
> materialization of results on the coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to