-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8104/#review18077
-----------------------------------------------------------



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38174>

    Is there a reason why this doesn't support Pushdown projection? I'm ok with 
making that another JIRA ticket to tack on, but it seems to me that there is no 
reason to make AVRO deserialize a field that is never used, and it would be 
easy to leverage AVRO's already existing schema resolution to do this (we have 
an expected reader schema and could just prune unused columns, or somesuch).



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38166>

    Given that allowRecursive is a boolean, why do you need to box and then 
unbox this? hasOption returns a boolean (at least in the API doc I saw)?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38167>

    this should all be log.error imho, since it leads to errors that terminate 
the script.



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38168>

    I don't love member names that start with capitals (especially when those 
members sound like classes), but I am willing to forego that if you're 
passionate about it :)



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38169>

    can the schema file not be an invisible file?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38170>

    I realize using Long's compareTo is convenient, but this seems like 
unnecessary boxing. why not just compare them directly? I realize this isn't 
performance critical cord, it just stuck out to me, since you could just do a > 
instead...



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38171>

    isn't this implicit?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38172>

    I am a fan of putting stuff like this in PigConstants, though I haven't 
gotten this to catch on. Mainly just so we can see what is actually being set 
in the jobconf but not required at all.



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38173>

    same as above...basically I think if it halts everything, it should be 
log.error, or just nothing at all. but everything is open to debate :)



src/org/apache/pig/builtin/TrevniStorage.java
<https://reviews.apache.org/r/8104/#comment38175>

    Is it part of Trevni that the metadata is snappy encoded?



src/org/apache/pig/impl/util/avro/AvroRecordWriter.java
<https://reviews.apache.org/r/8104/#comment38176>

    gasp! trailing whitespace! :)



src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38177>

    since I can imagine this failing, I think it may be useful to catch 
ClassPathException here, log some info about the data types at play, and then 
just rethrow it. Just thinking ahead to debugging the case where someone saves 
something with a schema that doesn't match their expected schema. Maybe this 
would be caught upstream first?



src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38178>

    Not a huge fan of this, esp. given that the Exception stack trace will let 
them know where the error happened.



src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38179>

    More info will be useful here, esp. since this doesn't handle DateTime, 
BigInteger, BigDecimal



src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38180>

    If you have a map with a complex value, such as an array, will it then be a 
tuple wrapped in a tuple? Or does avro not allow this?



src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38181>

    might be worth log.debug'ing this, even though they asked for it, it's 
worth knowing when it is applied



src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38182>

    May want to throw an UnsupportedOperationException instead, as if this is 
being called, it's a more fundamental issue with Pig, separate from write 
related issues.



src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38184>

    shouldn't this throw an error? Or is avroObject.put() doing something I 
don't expect, perhaps being 1-indexed instead of 0-indexed?



src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38183>

    What about other types? Bags etc?



src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38185>

    Is the fields list returned by getSchema().getFields() a copy, instead of 
the original? If it's the original, it'd already be mutated, adn there is no 
reason to reset, no?



src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38186>

    lol



src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38187>

    I suppose this doesn't have to be hyper-exact, but this does not take into 
account the fact that classes are always on 8-byte boundaries, as well as the 
overhead of classes themselves.



test/org/apache/pig/builtin/avro/code/pig/directory_test.pig
<https://reviews.apache.org/r/8104/#comment38188>

    trailing whitespace :)


- Jonathan Coveney


On Jan. 4, 2013, 7:22 p.m., Joseph Adler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8104/
> -----------------------------------------------------------
> 
> (Updated Jan. 4, 2013, 7:22 p.m.)
> 
> 
> Review request for pig and Cheolsoo Park.
> 
> 
> Description
> -------
> 
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> 
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni.
> 
> This is the latest version of the patch, complete with test cases and 
> TrevniStorage. (Test cases for TrevniStorage are still missing).
> 
> 
> This addresses bug PIG-3015.
>     https://issues.apache.org/jira/browse/PIG-3015
> 
> 
> Diffs
> -----
> 
>   .eclipse.templates/.classpath c7b83b8 
>   ivy.xml 70e8d50 
>   ivy/libraries.properties 7b07c7e 
>   src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION 
>   src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION 
>   src/org/apache/pig/impl/util/avro/AvroArrayReader.java PRE-CREATION 
>   src/org/apache/pig/impl/util/avro/AvroBagWrapper.java PRE-CREATION 
>   src/org/apache/pig/impl/util/avro/AvroMapWrapper.java PRE-CREATION 
>   src/org/apache/pig/impl/util/avro/AvroRecordReader.java PRE-CREATION 
>   src/org/apache/pig/impl/util/avro/AvroRecordWriter.java PRE-CREATION 
>   src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java 
> PRE-CREATION 
>   src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java 
> PRE-CREATION 
>   src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java PRE-CREATION 
>   test/commit-tests 5081fbc 
>   test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity_blank_first_args.pig 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/namesWithDoubleColons.pig 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/trevni_to_avro.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/trevni_to_trevni.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/arrays.json PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/arraysAsOutputByPig.json 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/json/recordsWithDoubleUnderscores.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/arrays.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/arraysAsOutputByPig.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithDoubleUnderscores.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithEnums.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithFixed.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithMaps.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithMapsOfRecords.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithNullableUnions.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recursiveRecord.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/simpleRecordsTrevni.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/testDirectory.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/testDirectoryCounts.avsc 
> PRE-CREATION 
>   test/unit-tests 7cede06 
> 
> Diff: https://reviews.apache.org/r/8104/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Joseph Adler
> 
>

Reply via email to