> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java,
> >  line 386
> > <https://reviews.apache.org/r/6536/diff/1/?file=138628#file138628line386>
> >
> >     The no_schema_check can now appear in any position, i.e., its no longer 
> > to required for it to be the first argument?

In fact, 'no_schema_check' wasn't used with other options at all. But since I 
was making it possible to use it with other options, I didn't want to restrict 
its position. So 'no_schema_check' can be at any position except between a 
parameter and its value:

no_schema_check, key, value => OK
key, value, no_schema_check => OK
key, no_schema_check, value => not OK


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java,
> >  line 389
> > <https://reviews.apache.org/r/6536/diff/1/?file=138628#file138628line389>
> >
> >     I am a little concerned with this change. For loop counters, to be 
> > conservative, I would avoid making changes in the body of the loop. Its 
> > hard to detect these changes and harder to maintain.
> >     
> >     Is there a better way of implementing this?

I agree that we should be careful about counters. But the problem is that the 
original loop body was implemented with an assumption that every arg is a pair 
of parameter/value, which is no longer true since I am allowing a parameter 
only arg 'no_schema_check'. So I think that I have to make some changes to the 
body of the loop.

Nevertheless, I made increasing counters more explicit in the new patch, so I 
believe that they are more visible now. Please let me know if you have a better 
suggestion.


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java,
> >  line 288
> > <https://reviews.apache.org/r/6536/diff/1/?file=138629#file138629line288>
> >
> >     Can this be changed to return containsGenericUnion(fs, visitedRecords) 
> > to be consistent with the other parts of the code?

No, it cannot. It is inside a loop, so we should not return until the loop is 
over.


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java,
> >  line 318
> > <https://reviews.apache.org/r/6536/diff/1/?file=138629#file138629line318>
> >
> >     Can this be changed to return containsGenericUnion(fs, visitedRecords) 
> > to be consistent with the other parts of the code?

No, it cannot. It is inside a loop, so we should not return until the loop is 
over.


> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java,
> >  line 866
> > <https://reviews.apache.org/r/6536/diff/1/?file=138630#file138630line866>
> >
> >     Not sure if this holds true for all cases. This is an either or - i.e., 
> > if a sequence of jobs has one failure then the assertion will kick in for 
> > the first one which is actually a false alarm.

Indeed. I modified the code, so now the number of failed jobs (numOfFailedJobs) 
is counted. If expectedToFail is true, numOfFailedJobs must be greater than 0; 
otherwise, it must be equal to 0.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6536/#review10347
-----------------------------------------------------------


On Aug. 16, 2012, 10:18 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6536/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2012, 10:18 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> Allow recursive records to be loaded/stored by AvroStorage.
> 
> The changes include:
> 
> 1) Remove the recursive record check from AvroSchema2Pig.
> 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records 
> to bytearrays.
> 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle 
> Avro schema that contains recursive records.
> 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can 
> be passed to both LoadFunc and StoreFunc.
> 5) Add the recursive record check to AvroSchemaManager. This is needed 
> because 'schema_file' and 'data' cannot refer to avro schema that contains 
> recursive records.
> 
> AvroStorage works as follows:
> 
> 1) PigSchema maps recursive records to bytearrays, so there is discrepancy 
> between Avro schema and Pig schema.
> 2) Recursive records are loaded as tuples even though Pig schema defines them 
> as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc).
> 3) To store recursive records, Avro schema must be provided via the 'schema' 
> or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be 
> enabled because otherwise schema check will fail due to discrepancy between 
> Avro schema and Pig schema.
> 4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. 
> This is because AvroSchemaManager cannot handle recursive records for now. 
> The recursive record check is added to AvroSchemaManager, so if Avro schema 
> that contains recursive records is specified by these parameters, an 
> exception is thrown.
> 
> 
> This addresses bug PIG-2875.
>     https://issues.apache.org/jira/browse/PIG-2875
> 
> 
> Diffs
> -----
> 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java
>  6b1d2a1 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java
>  1939d3e 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
>  c9f7d81 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
>  e24b495 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
>  2fab3f7 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
>  040234f 
> 
> Diff: https://reviews.apache.org/r/6536/diff/
> 
> 
> Testing
> -------
> 
> New test cases are added as follows:
> 
> 1) Load/store Avro files that contain recursive records in array, map, union, 
> and another record.
> 2) Load Avro files that contains recursive records, generate new relations, 
> apply filters, and store them as non-recursive records.
> 3) Tests for the StoreFunc parameters: no_schema_check, schema, same, 
> schema_file, and data.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>

Reply via email to