-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8765/
-----------------------------------------------------------

(Updated Dec. 31, 2012, 1:56 a.m.)


Review request for pig, Santhosh Srinivasan, Jonathan Coveney, and Joseph Adler.


Changes
-------

- The error rate is printed as part of job stats.
- The error message is improved. Now the location of the bad split that causes 
the run-time exception is printed.
- InputErrorTracker counts the number of splits instead of records.
- For backward compatibility, ignore_bad_files is not removed. When the 
ignore_bad_files option is enabled in AvroStorage, it is equivalent to setting 
pig.load.bad.split.threshold to 1.0.


Description
-------

This patch implements configurable bad records thresholds based on work done by 
Jonathan in PIG-2614.

The changes include:
- Adds new Pig properties - pig.load.bad.record.threshold and 
pig.load.bad.record.min.
- Removes 'ignore_bad_files' option from AvroStorage since it's no longer 
needed.
- Incorporates InputErrorTracker class written by Jonathan in PIG-2614.
- Adds a try-catch block to nextKeyValue() method in PigRecordReader.
- Adds new test cases to TestAvroStorage for these new properties.


This addresses bug PIG-3059.
    https://issues.apache.org/jira/browse/PIG-3059


Diffs (updated)
-----

  conf/pig.properties 001a75e 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 771c313 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
 0a84915 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
 9c37fec 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 28a448f 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile2.avro
 e69de29 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile3.avro
 e69de29 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile4.avro
 e69de29 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/bad.avro
 e69de29 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/good.avro
 e69de29 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/InputErrorTracker.java
 e69de29 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java
 6c77bad 
  src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java 45135b6 
  src/org/apache/pig/tools/pigstats/JobStats.java bdc08a5 
  src/org/apache/pig/tools/pigstats/PigStats.java 0228997 
  src/org/apache/pig/tools/pigstats/PigStatsUtil.java 521a482 
  src/org/apache/pig/tools/pigstats/SimplePigStats.java e4cd1c0 

Diff: https://reviews.apache.org/r/8765/diff/


Testing
-------

ant clean commit-test
ant clean compile-test jar-withouthadoop
cd contrib/piggybank/java
ant clean test -Dtestcase=TestAvroStorage


Thanks,

Cheolsoo Park

Reply via email to