[
https://issues.apache.org/jira/browse/IMPALA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dan Hecht reassigned IMPALA-708:
--------------------------------
Assignee: (was: Dan Hecht)
This is less relevant with clustering where the hashing no longer happens each
row.
> optimize hdfs-table-sink output partition hashing
> -------------------------------------------------
>
> Key: IMPALA-708
> URL: https://issues.apache.org/jira/browse/IMPALA-708
> Project: IMPALA
> Issue Type: Task
> Components: Backend
> Affects Versions: Impala 1.0, Impala 1.2
> Reporter: Nong Li
> Priority: Minor
> Labels: poc
>
> Looking at some basic profiling while doing an unpartitioned insert, it looks
> like we have some very low hanging fruit:
> 226 16.2% 16.2% 226 16.2%
> boost::unordered_detail::hash_table::find_iterator
> <-- Need to track down where this is (we need better cluster tools) but
> this seems like a big waste of time.
> 178 12.8% 29.0% 178 12.8%
> impala::HdfsParquetTableWriter::AppendRowBatch
> 157 11.3% 40.3% 157 11.3%
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9ff700
> 131 9.4% 49.7% 131 9.4% __strncmp_sse42
> 129 9.3% 59.0% 133 9.6% impala::TextConverter::WriteSlot
> 109 7.8% 66.9% 109 7.8%
> impala::DelimitedTextParser::ParseFieldLocations
> 94 6.8% 73.6% 94 6.8% snappy::internal::CompressFragment
> 71 5.1% 78.7% 71 5.1%
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fca90
> 56 4.0% 82.7% 56 4.0% impala::HdfsScanner::WriteCompleteTuple
> 36 2.6% 85.3% 36 2.6%
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fd3f0
> 34 2.4% 87.8% 34 2.4% impala::HashUtil::Hash
> 34 2.4% 90.2% 34 2.4%
> impala::StringParser::StringToIntInternal@801fd0
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]