[
https://issues.apache.org/jira/browse/PIG-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619470#comment-13619470
]
Feng Peng commented on PIG-3222:
--------------------------------
Hi [~daijy], thanks for looking into this!
1. The behavior definitely changed from pig_9 to pig_11, the trace using the
same program as above using pig_9 generates the following output:
{noformat}
[1]setStoreFuncUDFContextSignature(samples_testdb.samples_com.twitter.twadoop.dal.pig.StoreFuncTracer('org.apache.hcatalog.pig.HCatStorer','part_dt=20130204T000000Z'))
[1]relToAbsPathForStoreLocation(testdb.samples,hdfs://...)
[2]setStoreFuncUDFContextSignature(1-0_testdb.samples_com.twitter.twadoop.dal.pig.StoreFuncTracer('org.apache.hcatalog.pig.HCatStorer','part_dt=20130204T000000Z'))
[4]setStoreFuncUDFContextSignature(samples_testdb.samples_com.twitter.twadoop.dal.pig.StoreFuncTracer('org.apache.hcatalog.pig.HCatStorer','part_dt=20130204T000000Z'))
[5]setStoreFuncUDFContextSignature(1-1_testdb.samples_com.twitter.twadoop.dal.pig.StoreFuncTracer('org.apache.hcatalog.pig.HCatStorer','part_dt=20130204T000000Z'))
[5]setStoreFuncUDFContextSignature(samples_testdb.samples_com.twitter.twadoop.dal.pig.StoreFuncTracer('org.apache.hcatalog.pig.HCatStorer','part_dt=20130204T000000Z'))
[5]checkSchema(number:int)
[5]setStoreLocation(testdb.samples,Job@1023736867)
[5]getOutputFormat
[6]setStoreFuncUDFContextSignature(samples_testdb.samples_com.twitter.twadoop.dal.pig.StoreFuncTracer('org.apache.hcatalog.pig.HCatStorer','part_dt=20130204T000000Z'))
[6]setStoreLocation(testdb.samples,Job@1713380717)
[7]setStoreFuncUDFContextSignature(samples_testdb.samples_com.twitter.twadoop.dal.pig.StoreFuncTracer('org.apache.hcatalog.pig.HCatStorer','part_dt=20130204T000000Z'))
[7]setStoreLocation(testdb.samples,Job@687736006)
[7]getOutputFormat
{noformat}
Pig_9 instantiates the same StoreFunc 8 times with three different signatures
instead of 3 times / 2 signatures in pig_11 (which is good). However, ALL the
setStoreLocation functions are called with a DETERMINISTIC signature in pig_9,
which is concatenated by
{noformat}
* relation name: samples
* location string: testdb.samples
* storer statement:
com.twitter.twadoop.dal.pig.StoreFuncTracer('org.apache.hcatalog.pig.HCatStorer','part_dt=20130204T000000Z')
{noformat}
This signature can be safely regenerated in the backend, therefore we can
always retrieve the information stored at the frontend.
2. In HCatStorer.setStoreLocation (and similarly in HCatLoader.setLocation, we
have the following code basically caching information in the UDFContext using
the signature (plus the class) as the key:
{noformat}
Properties udfProps =
UDFContext.getUDFContext().getUDFProperties(this.getClass(), new
String[]{sign});
if (udfProps.containsKey(HCatConstants.HCAT_PIG_STORER_LOCATION_SET)) {
readConfigFromUDFProps();
} else {
getConfigFromHiveMetaStore();
udfProps.put(HCatConstants.HCAT_PIG_STORER_LOCATION_SET, true);
}
{noformat}
For HCatStorer, the problem is that the front-end and back-end are not
guaranteed to use the same signature, therefore the HCatStorer may break at the
backend.
For HCatLoader, the problem is that each instantiation uses a different
signature, and thus the above caching mechanism does not work. The result is
that the frontend reads rom HiveMetaStore as many times as the unique
signatures there are. It is annoying and inefficient, but doesn't break the
program. That's why the current ticket specifies the HCatStorer.
Can we use the same signature naming scheme as pig_9? As long as the relation
names are unique, (if not, we can enforce it in the normalization process), the
signature for every LoadFunc and StoreFunc will be unique and can be
constructed in a consistent way in both the frontend and backend.
> New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer
> ---------------------------------------------------------------------------
>
> Key: PIG-3222
> URL: https://issues.apache.org/jira/browse/PIG-3222
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.11
> Reporter: Feng Peng
> Labels: hcatalog
> Attachments: PigStorerDemo.java
>
>
> Pig 0.11 assigns different UDFContextSignature for different invocations of
> the same load/store statement. This change breaks the HCatStorer which
> assumes all front-end and back-end invocations of the same store statement
> has the same UDFContextSignature so that it can read the previously stored
> information correctly.
> The related HCatalog code is in
> https://svn.apache.org/repos/asf/incubator/hcatalog/branches/branch-0.5/hcatalog-pig-adapter/src/main/java/org/apache/hcatalog/pig/HCatStorer.java
> (the setStoreLocation() function).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira