[ 
https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-7873:
------------------------------
    Attachment: HIVE-7873.1-spark.patch

Attached a patch that re-enabled lazy HiveBaseFunctionResultList. A separate 
RowContainer is used to work around the no-write-after-read limitation of 
RowContainer. The patch also fixed a concurrency issue in HiveKVResultCache. 
Synchronized is used instead of reentrant lock since I assume there won't be 
many threads to access the cache.

Based on my test, the synchronization doesn't have noticeable overhead if there 
is no other thread. If each processNextRecord() call doesn't dump too many 
records to the cache, lazy result list have very good performance. However, if 
each processNextRecord() call dumps much more records than the cache can hold 
in memory, the performance gets worse.

> Re-enable lazy HiveBaseFunctionResultList
> -----------------------------------------
>
>                 Key: HIVE-7873
>                 URL: https://issues.apache.org/jira/browse/HIVE-7873
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Brock Noland
>            Assignee: Jimmy Xiang
>              Labels: Spark-M4, spark
>         Attachments: HIVE-7873.1-spark.patch
>
>
> We removed this optimization in HIVE-7799.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to