[
https://issues.apache.org/jira/browse/HIVE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700611#comment-16700611
]
Hive QA commented on HIVE-20330:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12949674/HIVE-20330.4.patch
{color:red}ERROR:{color} -1 due to build exiting with an error
Test results:
https://builds.apache.org/job/PreCommit-HIVE-Build/15075/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15075/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15075/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL
https://issues.apache.org/jira/secure/attachment/12949674/HIVE-20330.4.patch
was found in seen patch url's cache and a test was probably run already on it.
Aborting...
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12949674 - PreCommit-HIVE-Build
> HCatLoader cannot handle multiple InputJobInfo objects for a job with
> multiple inputs
> -------------------------------------------------------------------------------------
>
> Key: HIVE-20330
> URL: https://issues.apache.org/jira/browse/HIVE-20330
> Project: Hive
> Issue Type: Bug
> Components: HCatalog
> Reporter: Adam Szita
> Assignee: Adam Szita
> Priority: Major
> Attachments: HIVE-20330.0.patch, HIVE-20330.1.patch,
> HIVE-20330.2.patch, HIVE-20330.3.patch, HIVE-20330.4.patch
>
>
> While running performance tests on Pig (0.12 and 0.17) we've observed a huge
> performance drop in a workload that has multiple inputs from HCatLoader.
> The reason is that for a particular MR job with multiple Hive tables as
> input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance
> but only one table's information (InputJobInfo instance) gets tracked in the
> JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}).
> Any such call overwrites preexisting values, and thus only the last table's
> information will be considered when Pig calls {{getStatistics}} to calculate
> and estimate required reducer count.
> In cases when there are 2 input tables, 256GB and 1MB in size respectively,
> Pig will query the size information from HCat for both of them, but it will
> either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the
> execution plan's DAG.
> It should of course see 256.00097GB in total and use 257 reducers by default
> accordingly.
> In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle
> with the actual 256.00097GB...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)