> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > Thanks Josh, this is a very well written code to make Accumulo available to 
> > Pig. I can make it work locally. I have two major comment:
> > 1. Bear in mind, Pig is good to manipulate tuple, Ok for map, and horrible 
> > for bag. A columnar data would be perfect. AccumuloWholeRowStorage does not 
> > seems pretty useful since it requires user to manipulate bag in Pig script. 
> > User will have to write a UDF for it. AccumuloKVStorage is fine, but it is 
> > also awkward to manipulate the flattened data. AccumuloStorage is the best 
> > style among these three, user can project map entries to get the columnar 
> > data. But personally, I would still prefer HBastStorage style in which user 
> > can specify the column he's interested in construct arguments. So I would 
> > suggest to improve AccumuloStorage (or a new class) to include that.
> > 
> > 2. caster is not right. You will need to return correct caster in 
> > getLoadCaster. Otherwise cast data into Pig type will fail. Eg:
> > a = load 'accumulo://....' using 
> > org.apache.pig.backend.hadoop.accumulo.AccumuloStorage() as (key:bytearray, 
> > m:map[]);
> > b = foreach a generate (double)m#'gpa';
> > I see you give Utf8StorageConverter as the converter in some of the 
> > loaders, but Utf8StorageConverter will assume numerical data in string 
> > style, and try to parse the string into numerical. I see there is 
> > objToBytes in both AbstractAccumuloStorage and Utils. These should be 
> > consolidated into a caster (refer to HBaseBinaryConverter)
> 
> Josh Elser wrote:
>     For #1, I agree. I was never really sure about using the WholeRowStorage 
> variant, I think it probably just introduces more complexity, and, like you 
> said, isn't a great example for Pig in the first place. I can probably add in 
> another constructor variant that lets you set a column projection. Perhaps 
> the "fetch_column" option should be removed from the URI completely?
>     
>     As a follow on, I want to reduce the args to that accumulo URI (probably 
> pulling some file off of the classpath) so the user doesn't always have to 
> re-type the same four args (user, pass, instance name and zookeeper hosts).
>     
>     For #2, I think I understand, but I haven't been able to generate a 
> failure given the little snippet you provided. I'll look at the HBaseStorage 
> and HBaseBinaryConverter as an example.
> 
> Daniel Dai wrote:
>     For #1, I would prefer put column specification in the construct, just 
> like HBaseStorage did. 
>     
>     For #2, the exception is trapped, you will not see failure but null value 
> in the output.

I wasn't a big fan of the URI option parsing stuff, so I ended up taking the 
dive to move away from it. I imagine people currently use it like, so, in a 
effort to help them transition to AccumuloStorage in Pig (instead of the extra 
project sitting in an Accumulo repo), I've tried to keep it backwards 
compatible. Thanks again for your comments!


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16533/#review31404
-----------------------------------------------------------


On Jan. 10, 2014, 7:20 p.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16533/
> -----------------------------------------------------------
> 
> (Updated Jan. 10, 2014, 7:20 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3573
>     https://issues.apache.org/jira/browse/PIG-3573
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> Provides basic StoreFunc and LoadFunc implementations. Based off of code that 
> was in an Accumulo contrib project.
> 
> 
> Diffs
> -----
> 
>   build.xml 575c9ae 
>   ivy.xml 180eb2c 
>   ivy/libraries.properties 14abdf8 
>   src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloBinaryConverter.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageOptions.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/accumulo/FixedByteArrayOutputStream.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/accumulo/Utils.java PRE-CREATION 
>   
> test/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorageTest.java 
> PRE-CREATION 
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloPigClusterTest.java 
> PRE-CREATION 
>   
> test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageConfigurationTest.java
>  PRE-CREATION 
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageOptionsTest.java 
> PRE-CREATION 
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageTest.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16533/diff/
> 
> 
> Testing
> -------
> 
> Local tests reading, writing and JOIN'ing Accumulo tables. Tested against 
> Hadoop-1.0.4 and 2.2.0, with Accumulo 1.5.0
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Reply via email to