[ 
https://issues.apache.org/jira/browse/HIVE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5454:
------------------------------

    Attachment: D13317.1.patch

QwertyManiac requested code review of "HIVE-5454 [jira] HCatalog runs a 
partition listing with an empty filter".

Reviewers: JIRA

HIVE-5454. HCatalog runs a partition listing with an empty filter.

Modify the input format of HCat to load partitions always with a filter if 
available.

This is a HCATALOG-527 caused regression, wherein the HCatLoader's way of 
calling HCatInputFormat causes it to do 2x partition lookups - once without the 
filter, and then again with the filter.

For tables with large number partitions (100000, say), the non-filter lookup 
proves fatal both to the client ("Read timed out" errors from 
ThriftMetaStoreClient cause the server doesn't respond) and to the server (too 
much data loaded into the cache, OOME, or slowdown).

The fix would be to use a single call that also passes a partition filter 
information, as was in the case of HCatalog 0.4 sources before HCATALOG-527.

(HCatalog-release-wise, this affects all 0.5.x users)

TEST PLAN
  Built hcat module and ran all its existing tests

REVISION DETAIL
  https://reviews.facebook.net/D13317

AFFECTED FILES
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/transfer/impl/HCatInputFormatReader.java
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/HCatMapReduceTest.java
  
hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/39321/

To: JIRA, QwertyManiac


> HCatalog runs a partition listing with an empty filter
> ------------------------------------------------------
>
>                 Key: HIVE-5454
>                 URL: https://issues.apache.org/jira/browse/HIVE-5454
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.12.0
>            Reporter: Harsh J
>         Attachments: D13317.1.patch
>
>
> This is a HCATALOG-527 caused regression, wherein the HCatLoader's way of 
> calling HCatInputFormat causes it to do 2x partition lookups - once without 
> the filter, and then again with the filter.
> For tables with large number partitions (100000, say), the non-filter lookup 
> proves fatal both to the client ("Read timed out" errors from 
> ThriftMetaStoreClient cause the server doesn't respond) and to the server 
> (too much data loaded into the cache, OOME, or slowdown).
> The fix would be to use a single call that also passes a partition filter 
> information, as was in the case of HCatalog 0.4 sources before HCATALOG-527.
> (HCatalog-release-wise, this affects all 0.5.x users)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to