[jira] [Assigned] (IMPALA-708) optimize hdfs-table-sink output partition hashing

Dan Hecht (JIRA) Wed, 02 May 2018 11:35:44 -0700

     [ 
https://issues.apache.org/jira/browse/IMPALA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dan Hecht reassigned IMPALA-708:
--------------------------------

    Assignee:     (was: Dan Hecht)

This is less relevant with clustering where the hashing no longer happens each 
row. 

> optimize hdfs-table-sink output partition hashing
> -------------------------------------------------
>
>                 Key: IMPALA-708
>                 URL: https://issues.apache.org/jira/browse/IMPALA-708
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>    Affects Versions: Impala 1.0, Impala 1.2
>            Reporter: Nong Li
>            Priority: Minor
>              Labels: poc
>
> Looking at some basic profiling while doing an unpartitioned insert, it looks 
> like we have some very low hanging fruit:
>      226  16.2%  16.2%      226  16.2% 
> boost::unordered_detail::hash_table::find_iterator                            
>       <-- Need to track down where this is (we need better cluster tools) but 
> this seems like a big waste of time.
>      178  12.8%  29.0%      178  12.8% 
> impala::HdfsParquetTableWriter::AppendRowBatch
>      157  11.3%  40.3%      157  11.3% 
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9ff700
>      131   9.4%  49.7%      131   9.4% __strncmp_sse42
>      129   9.3%  59.0%      133   9.6% impala::TextConverter::WriteSlot
>      109   7.8%  66.9%      109   7.8% 
> impala::DelimitedTextParser::ParseFieldLocations
>       94   6.8%  73.6%       94   6.8% snappy::internal::CompressFragment
>       71   5.1%  78.7%       71   5.1% 
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fca90
>       56   4.0%  82.7%       56   4.0% impala::HdfsScanner::WriteCompleteTuple
>       36   2.6%  85.3%       36   2.6% 
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fd3f0
>       34   2.4%  87.8%       34   2.4% impala::HashUtil::Hash
>       34   2.4%  90.2%       34   2.4% 
> impala::StringParser::StringToIntInternal@801fd0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (IMPALA-708) optimize hdfs-table-sink output partition hashing

Reply via email to