[ 
https://issues.apache.org/jira/browse/PIG-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162025#comment-13162025
 ] 

xuting zhao commented on PIG-2387:
----------------------------------

test-commit has been successfully run on trunk, 0.10 and 0.9.

I fail to find a user case where the RecordReader.getProgess() is called by a 
Pig class during the running of a Pig script. So instead of adding a unit test 
to this patch, I ran the following scripts and printed out the result of the 
.getProgess() in InterRecordReader/BinStorageRecordReader class manually to 
check the correctness of the patch.

     Properties props = new Properties();
      for (Entry<Object, Object> entry : cluster.getProperties().entrySet()) {
          props.put(entry.getKey(), entry.getValue());
      }
      props.setProperty("mapred.max.split.size", "100");
      props.setProperty("pig.overrideBlockSize", "100");
      System.setProperty("pig.overrideBlockSize", "100");
        
        PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
        pigServer.registerQuery("a = load '" + file + "' AS (s:chararray);");
        ExecJob job = pigServer.store("a", "output", "BinStorage");
        pigServer.registerQuery("b = load 'output' using BinStorage() AS 
(s:chararray);");
        
        Iterator<Tuple> it = pigServer.openIterator("b");
        while(it.hasNext()) {
                it.next();
        }

The prints information shows:

1.When the iterator is created, a new BinStorageRecorderReader is created and 
the Progress is updated when the value is updated: 

new BinStorageRecordReader, its start and ends are: 0 100
Progress: 0.13


2.When the iterator.next() function is called, series of InterRecordReaders are 
instantiated and each time a tuple in the BinStorage is visited, the pos is 
updated:

new InterRecordReader, its start and ends are: 0 100
Progress: 0.1
Progress: 0.2
Progress: 0.29
Progress: 0.39
Progress: 0.49
Progress: 0.59
Progress: 0.69
Progress: 0.79
Progress: 0.89
Progress: 0.99
Progress: 1.0
new InterRecordReader, its start and ends are: 100 200
Progress: 0.19
Progress: 0.29
Progress: 0.39
Progress: 0.49
Progress: 0.59
Progress: 0.69
Progress: 0.79
Progress: 0.89
Progress: 0.99
Progress: 1.0

....
new InterRecordReader, its start and ends are: 800 900
Progress: 0.16
Progress: 0.26
Progress: 0.36
Progress: 0.45
Progress: 0.55
Progress: 0.65
Progress: 0.75
Progress: 0.85
Progress: 0.95
Progress: 1.0
new InterRecordReader, its start and ends are: 900 995
Progress: 0.15789473
Progress: 0.2631579
Progress: 0.36842105
Progress: 0.47368422
Progress: 0.57894737
Progress: 0.68421054
Progress: 0.7894737
Progress: 0.8947368
Progress: 1.0
                
> BinStorageRecordReader causes negative progress
> -----------------------------------------------
>
>                 Key: PIG-2387
>                 URL: https://issues.apache.org/jira/browse/PIG-2387
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Anitha Raju
>            Assignee: xuting zhao
>             Fix For: 0.9.2
>
>         Attachments: PIG-2387.patch
>
>
> Hi,
> When an input file of size greater than default split size is loaded using 
> BinStorage() and some processing is done, the task returns negative progress
> Script
> {code}
> A = load 'input' using BinStorage() as (a:chararray);
> B = filter A by (a matches '.*blinds.*');
> store B into 'op';
> {code}
> Looking at the code, BinStorage which uses BinStorageRecordReader, has 
> getProgress()
> {code}
> public float getProgress() {
>     if (start == end) {
>       return 0.0f;
>     } else {
>           return Math.min(1.0f, (pos - start) / (float)(end - start));
>     }
>   }
> {code}
> In BinStorageRecordReader, pos is always 0 and not getting updated at any 
> point.
> So when the input file of size greater than default split size is loaded and 
> processed, the getProgress() method returns negative value, thus showing 
> negative progress.
> Regards,
> Anitha 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to