> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java,
> >  line 157
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line157>
> >
> >     In the case where they do not give it a schema, I think that they 
> > shouldn't have to actually specify that it is a map. It's implicit. 
> > Furthermore, I think that instead of passing the Schema as an argument, if 
> > they do an "as (schema)" THEN it should attempt to return a Schema in 
> > accordance with what you provide.
> >     
> >     Open to argument, though.
> 
> Taku Miyakawa wrote:
>     > In the case where they do not give it a schema, I think that they 
> shouldn't have to actually specify that it is a map.
>     
>     Yes ":map[]" is redundant. I will remove it.
>     
>     > if they do an "as (schema)" THEN it should attempt to return a Schema 
> in accordance with what you provide.
>     
>     Does this code meet your point?
>     
>     <code>
>     access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader() AS (host:chararray, 
> ua:chararray);
>     ua = FOREACH access GENERATE ua;
>     dump ua;
>     </code>
>     
>     It will be good, but I could not find a way to implement the function in 
> that manner. LoadFunc seems to have no method which can get an input schema 
> specified by users.
> 
> Cheolsoo Park wrote:
>     >> It will be good, but I could not find a way to implement the function 
> in that manner. LoadFunc seems to have no method which can get an input 
> schema specified by users.
>     
>     You don't need access to the schema given by the AS clause inside 
> LoadFunc. You can just make LoadMetaData.getSchema() return null and load 
> every field as bytearray in LoadFunc.getNext(). Then, Pig inserts a FOREACH 
> operator after LOAD and does type conversion using the schema given by the AS 
> clause.
>     
>     This is what happens when PigStorage is not given a JSON schema file, but 
> the output schema is defined in the AS clause.
> 
> Jonathan Coveney wrote:
>     Cheolsoo is totally correct. That said, this should probably be 
> documented better. Also, the fact that we rely on an implicit foreach is so 
> gross... but this is how you do it!
> 
> Taku Miyakawa wrote:
>     I think the function is similar to JsonLoader, it also takes a schema as 
> an argument.
>     
>     An LTSV file itself contains field names in each row.  A column consists 
> of a name and a value (<name>:<value>).  And users may want to select fields 
> by their names.  That is why the loader function should know field names, and 
> it takes a schema as an argument.
>     
>     Let me explain in detail.
>     
>     In an LTSV file, orders of columns may differ for each row (although it 
> is not a common case).  And a row may lack some columns.  In this example, 
> the second row lacks "ua" column, and the third row has a different order of 
> columns, but it's OK.
>     
>     {data}
>     host:pc.example.com       req:GET /index.html     ua:Opera/9.80
>     host:user.example.net     req:GET /favicon.ico
>     ua:Mozilla/5.0    req:GET /news.html      host:workstation.example.org
>     {/data}
>     
>     To load "host" column and "ua" column into Pig fields, the loader 
> function should know their names, because it cannot depend on positions of 
> columns.
>     
>     {code}
>     access = LOAD 'access.log' USING LTSVLoader('host, ua');
>     DUMP access;
>     
>     -- Output:
>     -- (pc.example.com, Opera/9.80)
>     -- (user.example.net,)
>     -- (workstation.example.org, Mozilla/5.0)
>     {/code}
>

Hmm, ask on the dev@ mailing list about this. I understand your reasoning and 
it is reasonable, but that said, this syntax irks me. I think there has to be a 
way to do this, but I am not sure what it is. If there isn't let's add one :) 
But there should be. Once we nail that, I will give it another go but it looks 
close to a +1.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/#review17239
-----------------------------------------------------------


On March 9, 2013, 8:18 a.m., Taku Miyakawa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9685/
> -----------------------------------------------------------
> 
> (Updated March 9, 2013, 8:18 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> This is a review board for  https://issues.apache.org/jira/browse/PIG-3215
> 
> The patch adds LTSVLoader function and its test class.
> 
> 
> This addresses bug PIG-3215.
>     https://issues.apache.org/jira/browse/PIG-3215
> 
> 
> Diffs
> -----
> 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
>  PRE-CREATION 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9685/diff/
> 
> 
> Testing
> -------
> 
> ant compile-test
> cd contrib/piggybank/java/
> ant -Dtestcase=TestLTSVLoader test
> 
> 
> Thanks,
> 
> Taku Miyakawa
> 
>

Reply via email to