> On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote: > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, > > line 386 > > <https://reviews.apache.org/r/6536/diff/1/?file=138628#file138628line386> > > > > The no_schema_check can now appear in any position, i.e., its no longer > > to required for it to be the first argument?
In fact, 'no_schema_check' wasn't used with other options at all. But since I was making it possible to use it with other options, I didn't want to restrict its position. So 'no_schema_check' can be at any position except between a parameter and its value: no_schema_check, key, value => OK key, value, no_schema_check => OK key, no_schema_check, value => not OK > On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote: > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, > > line 389 > > <https://reviews.apache.org/r/6536/diff/1/?file=138628#file138628line389> > > > > I am a little concerned with this change. For loop counters, to be > > conservative, I would avoid making changes in the body of the loop. Its > > hard to detect these changes and harder to maintain. > > > > Is there a better way of implementing this? I agree that we should be careful about counters. But the problem is that the original loop body was implemented with an assumption that every arg is a pair of parameter/value, which is no longer true since I am allowing a parameter only arg 'no_schema_check'. So I think that I have to make some changes to the body of the loop. Nevertheless, I made increasing counters more explicit in the new patch, so I believe that they are more visible now. Please let me know if you have a better suggestion. > On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote: > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, > > line 288 > > <https://reviews.apache.org/r/6536/diff/1/?file=138629#file138629line288> > > > > Can this be changed to return containsGenericUnion(fs, visitedRecords) > > to be consistent with the other parts of the code? No, it cannot. It is inside a loop, so we should not return until the loop is over. > On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote: > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, > > line 318 > > <https://reviews.apache.org/r/6536/diff/1/?file=138629#file138629line318> > > > > Can this be changed to return containsGenericUnion(fs, visitedRecords) > > to be consistent with the other parts of the code? No, it cannot. It is inside a loop, so we should not return until the loop is over. > On Aug. 16, 2012, 7:27 a.m., Santhosh Srinivasan wrote: > > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, > > line 866 > > <https://reviews.apache.org/r/6536/diff/1/?file=138630#file138630line866> > > > > Not sure if this holds true for all cases. This is an either or - i.e., > > if a sequence of jobs has one failure then the assertion will kick in for > > the first one which is actually a false alarm. Indeed. I modified the code, so now the number of failed jobs (numOfFailedJobs) is counted. If expectedToFail is true, numOfFailedJobs must be greater than 0; otherwise, it must be equal to 0. - Cheolsoo ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6536/#review10347 ----------------------------------------------------------- On Aug. 16, 2012, 10:18 p.m., Cheolsoo Park wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6536/ > ----------------------------------------------------------- > > (Updated Aug. 16, 2012, 10:18 p.m.) > > > Review request for pig. > > > Description > ------- > > Allow recursive records to be loaded/stored by AvroStorage. > > The changes include: > > 1) Remove the recursive record check from AvroSchema2Pig. > 2) Modofy inconvert() in AvroSchema2Pig so that it can map recursive records > to bytearrays. > 3) Modify containsGenericUnion() in AvroStorageUtils so that it can handle > Avro schema that contains recursive records. > 4) Update the parameter parsing in AvroStorage so that 'no_schema_check' can > be passed to both LoadFunc and StoreFunc. > 5) Add the recursive record check to AvroSchemaManager. This is needed > because 'schema_file' and 'data' cannot refer to avro schema that contains > recursive records. > > AvroStorage works as follows: > > 1) PigSchema maps recursive records to bytearrays, so there is discrepancy > between Avro schema and Pig schema. > 2) Recursive records are loaded as tuples even though Pig schema defines them > as bytearrays and can be referred to by position (e.g. $0, $1.$0, etc). > 3) To store recursive records, Avro schema must be provided via the 'schema' > or 'same' parameter in StoreFunc. In addition, 'no_schema_check' must be > enabled because otherwise schema check will fail due to discrepancy between > Avro schema and Pig schema. > 4) Avro schema cannot be specified by the 'data' or 'schema_file' parameter. > This is because AvroSchemaManager cannot handle recursive records for now. > The recursive record check is added to AvroSchemaManager, so if Avro schema > that contains recursive records is specified by these parameters, an > exception is thrown. > > > This addresses bug PIG-2875. > https://issues.apache.org/jira/browse/PIG-2875 > > > Diffs > ----- > > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchema2Pig.java > 6b1d2a1 > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroSchemaManager.java > 1939d3e > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java > c9f7d81 > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java > e24b495 > > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java > 2fab3f7 > > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java > 040234f > > Diff: https://reviews.apache.org/r/6536/diff/ > > > Testing > ------- > > New test cases are added as follows: > > 1) Load/store Avro files that contain recursive records in array, map, union, > and another record. > 2) Load Avro files that contains recursive records, generate new relations, > apply filters, and store them as non-recursive records. > 3) Tests for the StoreFunc parameters: no_schema_check, schema, same, > schema_file, and data. > > > Thanks, > > Cheolsoo Park > >