[
https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721942#action_12721942
]
Zheng Shao commented on HIVE-396:
---------------------------------
Q: Why for the first query Hive program is faster than Hadoop app?
A: This is definitely possible in a lot of situations.
This particular case is mainly because Hive's implementation of LIKE is using
Text, while the hadoop app's implementation was using String.find(). We used
the hadoop code from the SIGMOD 2009 paper to allow us to have a consistent
comparison.
While it's possible to improve the hadoop code in this particular case, there
are cases that it's very hard to do the same optimization for each and every
hadoop application. For example, the map-side join (HIVE-195) provides much
better efficiency for joining a very small table with any other table, without
using reducer. Another case is the object model in Hive is different from
Hadoop - we reuse the same object across different rows. Details of this is in
the org.apache.hadoop.hive.serde package.
> Hive performance benchmarks
> ---------------------------
>
> Key: HIVE-396
> URL: https://issues.apache.org/jira/browse/HIVE-396
> Project: Hadoop Hive
> Issue Type: New Feature
> Reporter: Zheng Shao
> Attachments: hive_benchmark_2009-06-18.pdf,
> hive_benchmark_2009-06-18.tar.gz
>
>
> We need some performance benchmark to measure and track the performance
> improvements of Hive.
> Some references:
> PIG performance benchmarks PIG-200
> PigMix: http://wiki.apache.org/pig/PigMix
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.