Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

Jimmy Xiang Mon, 09 Feb 2015 10:49:03 -0800


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> >
> 
> Rui Li wrote:
>     Some high level question, do we still need two buffers? And does it make 
> sense to use something like a queue instead of an array as the buffer?


Queue should work too. Using too buffers makes it easier to switch between read 
and write. Switching itself is cheap here. For RowContainer, it is expensive to 
switch because of first()/clear(), etc.


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, 
> > line 54
> > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line54>
> >
> >     If I understand correctly, this can be renamed to something like 
> > IN_MEMORY_NUM_ROWS?

Yes, you are right. Both are ok. Any strong reason for renaming it?


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, 
> > line 76
> > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line76>
> >
> >     Do we need a parameter here? Seems it can just use writeCursor?

You are right. It is good to use writeCursor.


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, 
> > line 236
> > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line236>
> >
> >     I suppose this is to avoid frequent switch buffer? But why the magic 
> > number 1?

Right. If it is 1, there is no need to switch buffer. For other number, we need 
to switch anyway. I assume there are many scenarios that there is just one row.


- Jimmy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30739/#review71597
-----------------------------------------------------------


On Feb. 7, 2015, 3:09 a.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30739/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2015, 3:09 a.m.)
> 
> 
> Review request for hive, Rui Li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9574
>     https://issues.apache.org/jira/browse/HIVE-9574
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Result KV cache doesn't use RowContainer any more since it has logic we don't 
> need, which is some overhead. We don't do lazy computing right away, instead 
> we wait a little till the cache is close to spill.
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
>  78ab680 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java 
> 8ead0cb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 
> 7a09b4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java
>  e92e299 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> 070ea4d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d4ff37c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 
> 286816b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java 
> 0df4598 
> 
> Diff: https://reviews.apache.org/r/30739/diff/
> 
> 
> Testing
> -------
> 
> Unit test, test on cluster
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>

Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

Reply via email to