[
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573438#comment-14573438
]
Aihua Xu commented on HIVE-10754:
---------------------------------
OK. I find out why. These two are different (JobConf is not cloned when we call
new Job(conf)). When we call Job.getInstance(configuration) , that
configuration is cloned as well as the properties inside.
{{HCatInputFormat.setInput(job, dbName, tableName,
getPartitionFilterString());}} sets the value of
{{HCatConstants.HCAT_KEY_JOB_INFO}}. Currently since the {{clone}} is the same
as {{job}}, later when we try to pull the difference into {{udfProps}}, that
value is not captured.
So the clone is not a really clone with old interface.
> Pig+Hcatalog doesn't work properly since we need to clone the Job instance in
> HCatLoader
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-10754
> URL: https://issues.apache.org/jira/browse/HIVE-10754
> Project: Hive
> Issue Type: Sub-task
> Components: HCatalog
> Affects Versions: 1.2.0
> Reporter: Aihua Xu
> Assignee: Aihua Xu
> Attachments: HIVE-10754.patch
>
>
> {noformat}
> Create table tbl1 (key string, value string) stored as rcfile;
> Create table tbl2 (key string, value string);
> insert into tbl1 values( '1', '111');
> insert into tbl2 values('1', '2');
> {noformat}
> Pig script:
> {noformat}
> src_tbl1 = FILTER tbl1 BY (key == '1');
> prj_tbl1 = FOREACH src_tbl1 GENERATE
> key as tbl1_key,
> value as tbl1_value,
> '333' as tbl1_v1;
>
> src_tbl2 = FILTER tbl2 BY (key == '1');
> prj_tbl2 = FOREACH src_tbl2 GENERATE
> key as tbl2_key,
> value as tbl2_value;
>
> dump prj_tbl1;
> dump prj_tbl2;
> result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
> prj_result = FOREACH result
> GENERATE prj_tbl1::tbl1_key AS key1,
> prj_tbl1::tbl1_value AS value1,
> prj_tbl1::tbl1_v1 AS v1,
> prj_tbl2::tbl2_key AS key2,
> prj_tbl2::tbl2_value AS value2;
>
> dump prj_result;
> {noformat}
> The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2). We
> need to clone the job instance in HCatLoader.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)