-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8104/#review13962
-----------------------------------------------------------


Overall looks great! I haven't gone through the test cases yet, but here are my 
comments so far.


1) I noticed that I cannot load .avro files that are not record types. For 
example, I tried to load a .avro file whose schema is "int" as follows:

[cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar getschema 
foo2/test_int.avro 
"int"

[cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar tojson 
foo2/test_int.avro 
1

in = LOAD 'foo2/test_int.avro' USING AvroStorage('int');
DUMP in;

This gives me the following error:

Caused by: java.io.IOException: avroSchemaToResourceSchema only processes 
records

Can only Avro record type be loaded? Or am I doing something wrong?


2) TestAvroStorage needs to be more automated. To run it, I had to run the 
following commands:

ant clean compile-test
cd ./test/org/apache/pig/builtin/avro
python createests.py
cd -
ant clean test -Dtestcase=TestAvroStorage

Ideally, I should be able to run a single command: ant clean 
-Dtestcase=TestAvroStorage. Please let me know if you need help for this.


3) python createests.py fails with the following errors. I suppose that some 
files are missing:

creating data/avro/uncompressed/testDirectoryCounts.avro
Exception in thread "main" java.io.FileNotFoundException: 
data/json/testDirectoryCounts.json (No such file or directory)
...
creating evenFileNameTestDirectoryCounts.avro
Exception in thread "main" java.io.FileNotFoundException: 
data/json/evenFileNameTestDirectoryCounts.json (No such file or directory)
...


4) ant test -Dtestcase=TestAvroStorage fails with the following errors. I 
suppose that this is due to the missing files:

Testcase: testLoadDirectory took 0.005 sec
    FAILED
Testcase: testLoadGlob took 0.004 sec
    FAILED
Testcase: testPartialLoadGlob took 0.005 sec
    FAILED


5) Typo in the name of createests.py. It should be createtests.py.


6) Is createTests.bash needed at all? If not, can you remove it?


I have more comments inline:


ivy/libraries.properties
<https://reviews.apache.org/r/8104/#comment29844>

    Can you update .eclipse.templates/.classpath too?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29845>

    Can you remove this comment?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29850>

    Can you change Exception to SchemaParseException?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29907>

    Wouldn't it be better to make this static?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29847>

    FileSystem.get(new Configuration()).open() will look for a file on the 
default FS that is specified by Hadoop conf. But it won't work in the following 
case:
    
    1) The default FS is set by installed Hadoop conf as follows:
    <name>fs.default.name</name>
    <value>hdfs://cheolsoo-mr1-0.ent.cloudera.com:8020</value>
    
    2) Run Pig in *local* mode and specify a schema file that is on local FS. 
This gives me the following error:
    java.io.FileNotFoundException: File does not exist: 
/user/cheolsoo/test_record.avsc
    
    3) Even if I specify the uri schema as file:/, this still gives me the 
following error:
    java.lang.IllegalArgumentException: Wrong FS: 
file:/home/cheolsoo/pig-svn/test_record.avsc, expected: 
hdfs://cheolsoo-mr1-0.ent.cloudera.com:8020
    
    
    Can you instead do the following?
    
    Path p = new Path(...);
    (FileSystem.get(p.toUri(), new Configuration()).open(p);
    
    Then, local FS will be used if file:/ is specified in path, and the default 
FS is if no uri scheme is specified.



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29848>

    Same problem as above.



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29849>

    Can you split this into two catch blocks: ParseException and IOException?
    
    It seems incorrect that the help message is printed when FileSystem throws 
an IOExcetion.



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29851>

    Can you filter out files that start with "." as well?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29852>

    Same as above.



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29853>

    This won't work in the following case. Let's say p matches two dirs, and 
one dir is empty.
    
    p = foo*
    
    foo1
    foo2/bar.avro
    
    I would expect the schema of bar.avro is returned, but I get an IOException 
instead.



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29854>

    Typo: not => No.



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29857>

    Can you rename this class without Fast?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29856>

    Is this needed?
    
    In the constructor, schema is supposed to be set. If not, there must be an 
error. Shouldn't we throw an exception instead of re-trying to set schema?
    
    Please correct me if I am wrong.



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29855>

    Can you replace ".avro" with AvroOutputFormat.EXT?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29870>

    Can you move this comment to the checkSchema() method?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29858>

    Can you move this comment to the prepareToWrite() method?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29859>

    Can you instead use log.debug(..., e)?



src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29865>

    Can you instead use log.debug(..., e)?



src/org/apache/pig/builtin/TrevniStorage.java
<https://reviews.apache.org/r/8104/#comment29860>

    Can you filter out files that start with "." as well?



src/org/apache/pig/builtin/TrevniStorage.java
<https://reviews.apache.org/r/8104/#comment29861>

    AvroStorage accepts files that do not end ".avro". Shouldn't TrevniStorage 
do the same?



src/org/apache/pig/builtin/TrevniStorage.java
<https://reviews.apache.org/r/8104/#comment29862>

    Is this needed?
    
    In the constructor, schema is supposed to be set. If not, there must be an 
error. Shouldn't we throw an exception instead of re-trying to set schema?



src/org/apache/pig/builtin/TrevniStorage.java
<https://reviews.apache.org/r/8104/#comment29863>

    AvroStorage accepts files that do not end ".avro". Shouldn't TrevniStorage 
do the same?



src/org/apache/pig/impl/util/AvroRecordReader.java
<https://reviews.apache.org/r/8104/#comment29864>

    I can't find where -ignoreErrors is used. I guess that error handling for 
bad files is not implemented yet?



src/org/apache/pig/impl/util/AvroStorageDataConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment29866>

    Can you instead use log.debug(..., e)?



src/org/apache/pig/impl/util/AvroStorageDataConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment29867>

    Can you instead use log.debug(..., e)?



src/org/apache/pig/impl/util/AvroStorageSchemaConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment29871>

    How about a union type that contains a single data type (e.g. ["string"])? 
They're currently supported.



src/org/apache/pig/impl/util/AvroStorageSchemaConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment29868>

    Can you instead use log.debug(..., e)?



src/org/apache/pig/impl/util/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment29869>

    Can you instead use log.debug(..., e)?


- Cheolsoo Park


On Nov. 17, 2012, 5:28 a.m., Joseph Adler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8104/
> -----------------------------------------------------------
> 
> (Updated Nov. 17, 2012, 5:28 a.m.)
> 
> 
> Review request for pig and Cheolsoo Park.
> 
> 
> Description
> -------
> 
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> 
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni.
> 
> This is the latest version of the patch, complete with test cases and 
> TrevniStorage. (Test cases for TrevniStorage are still missing).
> 
> 
> This addresses bug PIG-3015.
>     https://issues.apache.org/jira/browse/PIG-3015
> 
> 
> Diffs
> -----
> 
>   build.xml 7d468a0 
>   ivy.xml 70e8d50 
>   ivy/libraries.properties 317564f 
>   src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION 
>   src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION 
>   src/org/apache/pig/impl/util/AvroBagWrapper.java PRE-CREATION 
>   src/org/apache/pig/impl/util/AvroMapWrapper.java PRE-CREATION 
>   src/org/apache/pig/impl/util/AvroRecordReader.java PRE-CREATION 
>   src/org/apache/pig/impl/util/AvroRecordWriter.java PRE-CREATION 
>   src/org/apache/pig/impl/util/AvroStorageDataConversionUtilities.java 
> PRE-CREATION 
>   src/org/apache/pig/impl/util/AvroStorageSchemaConversionUtilities.java 
> PRE-CREATION 
>   src/org/apache/pig/impl/util/AvroTupleWrapper.java PRE-CREATION 
>   test/commit-tests 5081fbc 
>   test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION 
>   test/org/apache/pig/builtin/avro/createTests.bash PRE-CREATION 
>   test/org/apache/pig/builtin/avro/createests.py PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/avro/deflate/records.avro 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/deflate/recordsAsOutputByPig.avro 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/avro/snappy/records.avro PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordWithRepeatedSubRecords.avro
>  PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/avro/uncompressed/records.avro 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsAsOutputByPig.avro
>  PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsOfArrays.avro 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsOfArraysOfRecords.avro
>  PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsSubSchema.avro 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsSubSchemaNullable.avro
>  PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithEnums.avro 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithFixed.avro 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithMaps.avro 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithMapsOfRecords.avro
>  PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithNullableUnions.avro
>  PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/recursiveRecord.avro 
> PRE-CREATION 
>   
> test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithEnums.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithFixed.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithMaps.avsc PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithMapsOfRecords.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recordsWithNullableUnions.avsc 
> PRE-CREATION 
>   test/org/apache/pig/builtin/avro/schema/recursiveRecord.avsc PRE-CREATION 
>   test/unit-tests 0f18a0e 
> 
> Diff: https://reviews.apache.org/r/8104/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Joseph Adler
> 
>

Reply via email to