-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5936/
-----------------------------------------------------------

(Updated July 19, 2012, 1:23 a.m.)


Review request for pig.


Changes
-------

1) Added more unit tests including some negative tests.

2) Removed getPathsFromString() because I realized that fs.globStatus() 
implicitly expands comma-separated string into paths, so it is redundant to 
explicitly do it.

3) Changed the type of 1st parameter of getAllSubDirs() from URI to 
hadoop.fs.Path. This is needed because '{' and '}' are not allowed in URI, so 
URI.create() throws a URISyntaxException on a glob pattern. But these 
characters are automatically escaped when constructing a Path. Note that this 
wasn't an issue in my previous patch because getPathsFromString() used to 
implicitly convert a glob pattern to paths, but now I removed 
getPathsFromString() and have to do it explicitly.

In fact, this reverts some changes made by PIG-2540 
(https://issues.apache.org/jira/browse/PIG-2540). However, this does not break 
S3 support because inside getAllSubDirs(), file system is still constructed for 
the given URI, and globStatus() is called on that file system.

FileSystem fs = FileSystem.get(path.toUri(), job.getConfiguration());
FileStatus[] matchedFiles = fs.globStatus(path);

So if path is a s3 URI, S3 file system will be used.


Description
-------

Add glob support to AvroStorage:

https://issues.apache.org/jira/browse/PIG-2492


This addresses bug PIG-2492.
    https://issues.apache.org/jira/browse/PIG-2492


Diffs (updated)
-----

  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 0f8ef27 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
 c7de726 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 48b093b 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
 e5d0c38 

Diff: https://reviews.apache.org/r/5936/diff/


Testing
-------

1. Added new unit tests as follows:

- testDir verifies that AvroStorage recursively loads files in a directory and 
its sub-directories.
- testGlob1 to 3 verify that glob patterns are expanded properly.

To run the tests, please do the following:

wget 
https://issues.apache.org/jira/secure/attachment/12536534/avro_test_files.tar.gz
 
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage

2. Both TestAvroStorage and TestAvroStorageUtils pass.


Thanks,

Cheolsoo Park

Reply via email to