[
https://issues.apache.org/jira/browse/PIG-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162025#comment-13162025
]
xuting zhao commented on PIG-2387:
----------------------------------
test-commit has been successfully run on trunk, 0.10 and 0.9.
I fail to find a user case where the RecordReader.getProgess() is called by a
Pig class during the running of a Pig script. So instead of adding a unit test
to this patch, I ran the following scripts and printed out the result of the
.getProgess() in InterRecordReader/BinStorageRecordReader class manually to
check the correctness of the patch.
Properties props = new Properties();
for (Entry<Object, Object> entry : cluster.getProperties().entrySet()) {
props.put(entry.getKey(), entry.getValue());
}
props.setProperty("mapred.max.split.size", "100");
props.setProperty("pig.overrideBlockSize", "100");
System.setProperty("pig.overrideBlockSize", "100");
PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
pigServer.registerQuery("a = load '" + file + "' AS (s:chararray);");
ExecJob job = pigServer.store("a", "output", "BinStorage");
pigServer.registerQuery("b = load 'output' using BinStorage() AS
(s:chararray);");
Iterator<Tuple> it = pigServer.openIterator("b");
while(it.hasNext()) {
it.next();
}
The prints information shows:
1.When the iterator is created, a new BinStorageRecorderReader is created and
the Progress is updated when the value is updated:
new BinStorageRecordReader, its start and ends are: 0 100
Progress: 0.13
2.When the iterator.next() function is called, series of InterRecordReaders are
instantiated and each time a tuple in the BinStorage is visited, the pos is
updated:
new InterRecordReader, its start and ends are: 0 100
Progress: 0.1
Progress: 0.2
Progress: 0.29
Progress: 0.39
Progress: 0.49
Progress: 0.59
Progress: 0.69
Progress: 0.79
Progress: 0.89
Progress: 0.99
Progress: 1.0
new InterRecordReader, its start and ends are: 100 200
Progress: 0.19
Progress: 0.29
Progress: 0.39
Progress: 0.49
Progress: 0.59
Progress: 0.69
Progress: 0.79
Progress: 0.89
Progress: 0.99
Progress: 1.0
....
new InterRecordReader, its start and ends are: 800 900
Progress: 0.16
Progress: 0.26
Progress: 0.36
Progress: 0.45
Progress: 0.55
Progress: 0.65
Progress: 0.75
Progress: 0.85
Progress: 0.95
Progress: 1.0
new InterRecordReader, its start and ends are: 900 995
Progress: 0.15789473
Progress: 0.2631579
Progress: 0.36842105
Progress: 0.47368422
Progress: 0.57894737
Progress: 0.68421054
Progress: 0.7894737
Progress: 0.8947368
Progress: 1.0
> BinStorageRecordReader causes negative progress
> -----------------------------------------------
>
> Key: PIG-2387
> URL: https://issues.apache.org/jira/browse/PIG-2387
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0, 0.9.0
> Reporter: Anitha Raju
> Assignee: xuting zhao
> Fix For: 0.9.2
>
> Attachments: PIG-2387.patch
>
>
> Hi,
> When an input file of size greater than default split size is loaded using
> BinStorage() and some processing is done, the task returns negative progress
> Script
> {code}
> A = load 'input' using BinStorage() as (a:chararray);
> B = filter A by (a matches '.*blinds.*');
> store B into 'op';
> {code}
> Looking at the code, BinStorage which uses BinStorageRecordReader, has
> getProgress()
> {code}
> public float getProgress() {
> if (start == end) {
> return 0.0f;
> } else {
> return Math.min(1.0f, (pos - start) / (float)(end - start));
> }
> }
> {code}
> In BinStorageRecordReader, pos is always 0 and not getting updated at any
> point.
> So when the input file of size greater than default split size is loaded and
> processed, the getProgress() method returns negative value, thus showing
> negative progress.
> Regards,
> Anitha
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira