[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1373: [WIP] [HUDI-646] Re-enable TestUpdateSchemaEvolution to reproduce CI error

GitBox Wed, 11 Mar 2020 17:27:14 -0700

lamber-ken edited a comment on issue #1373: [WIP] [HUDI-646] Re-enable 
TestUpdateSchemaEvolution to reproduce CI error
URL: https://github.com/apache/incubator-hudi/pull/1373#issuecomment-596656326
 
 
   hi @vinothchandar 
   
   After many tests, I finally found the answer, let me explain this issue step 
by step.
   
   Three questions
   1, Why ci build failure after code cleanup?
   2, Why junit doesn't report failure in local env?
   3, Why `TestHBaseQPSResourceAllocator` affects `TestUpdateSchemaEvolution`?
   
   #### Stackstrace
   ```
   Job aborted due to stage failure: Task 7 in stage 1.0 failed 1 times, most 
recent failure: Lost task 7.0 in stage 1.0 (TID 15, localhost, executor 
driver): org.apache.parquet.io.ParquetDecodingException: Can not read value at 
0 in block -1 in file 
file:/tmp/junit3406952253616234024/2016/01/31/f1-0_7-0-7_100.parquet
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
        at 
org.apache.hudi.common.util.ParquetUtils.readAvroRecords(ParquetUtils.java:190)
        at 
org.apache.hudi.client.TestUpdateSchemaEvolution.lambda$testSchemaEvolutionOnUpdate$dfb2f24e$1(TestUpdateSchemaEvolution.java:123)
        at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
        at scala.collection.AbstractIterator.to(Iterator.scala:1334)
        at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
        at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
        at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
        at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
        at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
        at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:123)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.UnsupportedOperationException: Byte-buffer read 
unsupported by input stream
        at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:146)
        at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
        at 
org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81)
        at 
org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90)
        at 
org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75)
        at 
org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174)
        at 
org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:127)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
        ... 29 more
   ```
   
   **Step 1:** Check whether the `f1-0_7-0-7_100.parquet` file is complete or 
not, used hdfs api to check it, it's no problem
   
   **Setp 2:** Noticed the `UnsupportedOperationException: Byte-buffer read 
unsupported by input stream` exception
   Add some log statements to `FSDataInputStream` and rerun the unit by travis, 
I found two different implementation class
   
   - org.apache.hadoop.fs.FSDataInputStream
   - org.apache.hadoop.fs.ChecksumFileSystem.ChecksumFSInputChecker
   
   ```
   class org.apache.hadoop.fs.FSDataInputStream
   java.lang.Exception
        at 
org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:55)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.<init>(ChecksumFileSystem.java:271)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:351)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
        at 
org.apache.parquet.hadoop.util.HadoopInputFile.newStream(HadoopInputFile.java:65)
        at 
org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:688)
        at 
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:596)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
   ```
   ```
   class org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker
   java.lang.Exception
        at 
org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:55)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.<init>(ChecksumFileSystem.java:271)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:351)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
        at 
org.apache.parquet.hadoop.util.HadoopInputFile.newStream(HadoopInputFile.java:65)
        at 
org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:688)
        at 
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:596)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152)
        at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
   ```
   
   **Step 3:** why exits two different stackstrace when read parquet file?
   from `org.apache.hadoop.fs.ChecksumFileSystem#open`, we can see that it 
controlled by `verifyChecksum` variable
   ```
   @Override
   public FSDataInputStream open(Path f, int bufferSize) throws IOException {
     FileSystem fs;
     InputStream in;
     if (verifyChecksum) {
       fs = this;
       in = new ChecksumFSInputChecker(this, f, bufferSize);
     } else {
       fs = getRawFileSystem();
       in = fs.open(f, bufferSize);
     }
     return new FSDataBoundedInputStream(fs, f, in);
   }
   ```
   
   **Step 4:** debug when `verifyChecksum` changed? `verifyChecksum` by default 
is `true`.
   
   while `new HBaseTestingUtility()` in 
`org.apache.hudi.index.TestHBaseQPSResourceAllocator`
   ```
   java.lang.Exception
        at 
org.apache.hadoop.fs.ChecksumFileSystem.setWriteChecksum(ChecksumFileSystem.java:79)
        at org.apache.hadoop.hbase.fs.HFileSystem.<init>(HFileSystem.java:87)
        at org.apache.hadoop.hbase.fs.HFileSystem.get(HFileSystem.java:372)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.getTestFileSystem(HBaseTestingUtility.java:3058)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.getNewDataTestDirOnTestFS(HBaseTestingUtility.java:496)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.setupDataTestDirOnTestFS(HBaseTestingUtility.java:485)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.getDataTestDirOnTestFS(HBaseTestingUtility.java:458)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.getDataTestDirOnTestFS(HBaseTestingUtility.java:472)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.createDirsAndSetProperties(HBaseTestingUtility.java:649)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:575)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:982)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:863)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:857)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:801)
   ```
   ```
   public HFileSystem(Configuration conf, boolean useHBaseChecksum)
     throws IOException {
   
     ...
     
     // disable checksum verification for local fileSystem, see HBASE-11218
     if (fs instanceof LocalFileSystem) {
       fs.setWriteChecksum(false);
       fs.setVerifyChecksum(false);
     }
   
     ...
   }  
   ```
   
   **Step 5:** why `TestHBaseQPSResourceAllocator` affect 
`TestUpdateSchemaEvolution`?
   
   From the following code, we know that `filesystem` is shared with same uri, 
   or please check `org.apache.hadoop.fs.FileSystem`
   
   ```
   verifyChecksum: false
   fs1 hashcode: 1288235781
   fs2 hashcode: 1288235781
   
   FileSystem fs1 = new Path("aaa").getFileSystem(new Configuration());
   fs1.setVerifyChecksum(false);
   
   FileSystem fs2 = new Path("bbb").getFileSystem(new Configuration());
   if (fs2 instanceof ChecksumFileSystem) {
     Field field = ChecksumFileSystem.class.getDeclaredField("verifyChecksum");
     field.setAccessible(true);
     System.out.println("verifyChecksum: " + field.get(fs2));
   }
   
   System.out.println("fs1 hashcode: " + fs1.hashCode());
   System.out.println("fs2 hashcode: " + fs1.hashCode());
   ```
   
   **Step 6:** why ci build failure after code cleanup?
   
   we know junit executes in alphabetical order, so after cleanup, ci failure
   
   prev 
   - TestHBaseQPSResourceAllocator
   - TestUpdateMapFunction
   
   now
   - TestHBaseQPSResourceAllocator
   - TestUpdateSchemaEvolution
   
   **Step 7:** Why junit doesn't report failure in local env?
   
   Because we just test `TestUpdateSchemaEvolution`, the `verifyChecksum` 
doesn't be modified.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1373: [WIP] [HUDI-646] Re-enable TestUpdateSchemaEvolution to reproduce CI error

Reply via email to