cdmikechen edited a comment on issue #2005:
URL: https://github.com/apache/hudi/issues/2005#issuecomment-678860501
> @cdmikechen : Also, if you look at integration tests ITTestHoodieDemo, we
cover the tests with hive syncing and this test has been passing for us. Can
you take a look at the tests to see what the difference is ?
@bvaradar I checked `hudi-integ-test` package and found the reason:
In `hudi-integ-test` pom.xml where contains `ITTestHoodieDemo`, hudi
contains `hudi-exec-2.3.1` In pom dependencies. So that if we new a
`MapredParquetInputFormat` class, hudi will use this class by `hudi-exec-2.3.1`.
```java
package org.apache.hadoop.hive.ql.io.parquet;
import java.io.IOException;
import org.apache.hadoop.hive.ql.exec.Utilities;
import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reporter;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.parquet.hadoop.ParquetInputFormat;
public class MapredParquetInputFormat extends FileInputFormat<NullWritable,
ArrayWritable> implements VectorizedInputFormatInterface {
```
But if we just use a standalone spark environmental without hive-2.3.1
dependencies (like starting a new project and only depend spark lib), hudi will
use `hive-exec-1.2.1-spark`.
```java
package org.apache.hadoop.hive.ql.io.parquet;
import java.io.IOException;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.hive.ql.exec.Utilities;
import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.RecordReader;
import parquet.hadoop.ParquetInputFormat;
public class MapredParquetInputFormat extends FileInputFormat<Void,
ArrayWritable> {
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]