This is an automated email from the ASF dual-hosted git repository.
vhs pushed a change to branch variant-intro-shredded-avro-support
in repository https://gitbox.apache.org/repos/asf/hudi.git
omit e4fcb807db52 feat(schema): Add read + write support for shredded for
AVRO
omit 8862585a1b4e feat(schema): Config path implemented for spark record
type
omit f2dea5263f1f feat(schema): Add support to write shredded variants
(#18036)
omit fd20018e05f1 feat(vector): Add Spark VECTOR Search TVF with intial KNN
algorithm (#18432)
omit 3c53b91d9633 perf(metadata): avoid recursive calls for partition
listing using catalog (#18265)
omit d60be1448d5b fix: do not shutdown distruptor thread in snapshotState
in flink connnector (#18446)
omit 0baabba65f36 feat(common): Add API to fetch log files created on or
before given instant time (#18142)
omit 57c3f632a203 refactor: consolidate common utility classes for Flink
CDC read (#18436)
omit d69826cc5dc2 fix(flink): Trigger a failover after pending instants
recommitted to b… (#18434)
omit 3e662f9828d3 feat (flink): improvement bucket assignment for MOR with
bucket index (#18444)
omit 76b4fa404c72 feat(table-services): Support
hoodie.clustering.enable.expirations to allow cleanup of failed clustering
plans (intended for PreferWriterConflictResolutionStrategy) (#18302)
omit 149d84f89705 feat(metrics): Add table-specific metrics registry
support for multi-tenant scenarios (#18179)
omit c9d0ffb6ccca fix(common): close parquet reader iterator on EOF (#18407)
omit 9af7f29ceef7 perf(table-services): Only attempt scheduling log
compaction if number of deltacommits is at least LogCompactionBlocksThreshold
(#18306)
omit 180592a0ad71 feat(spark): implement column pruning for incremental
queries (#17514)
omit 447af5a8dbc7 fix(spark): Ignore duplicate fields when merging schema
in IncrementalRelation (#17776)
omit 3e1d3008ec14 feat(vector): Add guard for user creating nested VECTOR
(#18431)
omit ea59d60a86d1 fix(spark): validate and normalize incremental start/end
instants (#18426)
omit 6079b6a7ed65 feat: support limit push down in Hudi Flink Source V2
(#18406)
omit 35e2bbf9813f refactor: add import (#18428)
omit f6a33fd828d5 feat(lance): Implement canWrite() in
HoodieSparkLanceWriter with configurable max file size for Lance (#18341)
omit bef0c54ac5b4 [HUDI-3055] Fix hardcoded GZIP compression codec in
HFileUtils (#18263)
omit dd8fe9953f68 refactor: remove
HoodieWriteConfig.getOrcCompressionCodec() function (#18422)
omit 75918eabcb33 feat(blob): Create blobs in Spark SQL (#18347)
omit 2b56eae4b8c0 feat(utilities): add option to make all schema columns
nullable for backwards compatibility (#17777)
omit e270e256bac6 perf: Improve Serialization Performance of BufferedRecord
(#18418)
omit 1ff0506ec340 fix: Use target schema for non-FileBased/SchemaRegistry
providers in SourceFormatAdapter (#17946)
omit 236dc2233a70 test(lance): Add test of bloomFilter support to
TestLanceDataSource (#18388)
add 78109aa88b4a feat(schema): Add support to write shredded variants
add b702b7d75d83 feat(schema): Config path implemented for spark record
type
add f80cb5a3f4e9 feat(schema): Add read + write support for shredded for
AVRO
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (e4fcb807db52)
\
N -- N -- N refs/heads/variant-intro-shredded-avro-support
(f80cb5a3f4e9)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
No new revisions were added by this update.
Summary of changes:
.../org/apache/hudi/client/BaseHoodieClient.java | 5 +
.../hudi/client/BaseHoodieTableServiceClient.java | 69 +-
.../apache/hudi/config/HoodieClusteringConfig.java | 37 +-
.../org/apache/hudi/config/HoodieWriteConfig.java | 15 +-
.../compact/ScheduleCompactionActionExecutor.java | 65 +-
.../apache/hudi/util/DistributedRegistryUtil.java | 56 -
.../client/TestBaseHoodieTableServiceClient.java | 245 ----
.../apache/hudi/config/TestHoodieWriteConfig.java | 83 --
.../apache/hudi/util/HoodieSchemaConverter.java | 2 +-
.../apache/hudi/client/SparkRDDWriteClient.java | 30 +-
.../client/common/HoodieSparkEngineContext.java | 54 +-
.../io/storage/HoodieSparkFileWriterFactory.java | 12 +-
.../hudi/io/storage/HoodieSparkLanceWriter.java | 104 +-
.../row/HoodieInternalRowFileWriterFactory.java | 19 +-
.../storage/row/HoodieRowParquetWriteSupport.java | 45 +-
.../apache/hudi/metrics/DistributedRegistry.java | 5 -
.../apache/spark/sql/HoodieInternalRowUtils.scala | 37 +
.../sql/avro/HoodieSparkSchemaConverters.scala | 30 +-
.../parquet/HoodieParquetFileFormatHelper.scala | 3 +-
.../org/apache/spark/sql/hudi/SparkAdapter.scala | 26 +
.../hudi/common/table/log/TestLogReaderUtils.java | 245 ----
.../hudi/metrics/TestDistributedRegistry.java | 231 ----
.../org/apache/hudi/BaseHoodieTableFileIndex.java | 22 +-
.../hudi/common/config/HoodieStorageConfig.java | 11 -
.../hudi/common/engine/HoodieEngineContext.java | 14 -
.../apache/hudi/common/schema/HoodieSchema.java | 50 +-
.../hudi/common/table/log/LogReaderUtils.java | 79 --
.../common/table/timeline/BaseHoodieTimeline.java | 11 -
.../hudi/common/table/timeline/HoodieTimeline.java | 5 -
.../apache/hudi/common/util/CompactionUtils.java | 65 -
.../org/apache/hudi/common/util/HFileUtils.java | 3 +-
.../common/util/HoodieCommonKryoRegistrar.java | 4 +-
.../common/util/queue/DisruptorMessageQueue.java | 27 -
.../metadata/FileSystemBackedTableMetadata.java | 12 +-
.../hudi/metadata/HoodieBackedTableMetadata.java | 13 +-
.../apache/hudi/metadata/HoodieTableMetadata.java | 7 -
.../apache/hudi/util/PartitionPathFilterUtil.java | 38 -
.../hudi/common/schema/TestHoodieSchema.java | 69 +-
.../hudi/util/TestPartitionPathFilterUtil.java | 56 -
.../apache/hudi/configuration/FlinkOptions.java | 8 +
.../hudi/sink/StreamWriteOperatorCoordinator.java | 48 +-
...AppendWriteFunctionWithDisruptorBufferSort.java | 30 +-
.../hudi/sink/bootstrap/RLIBootstrapOperator.java | 89 +-
.../org/apache/hudi/sink/event/Correspondent.java | 38 +
.../org/apache/hudi/sink/utils/EventBuffers.java | 15 +-
.../java/org/apache/hudi/sink/utils/Pipelines.java | 7 +-
.../org/apache/hudi/source/HoodieScanContext.java | 1 -
.../java/org/apache/hudi/source/HoodieSource.java | 2 +-
.../hudi/source/reader/HoodieSourceReader.java | 9 +-
.../source/reader/HoodieSourceSplitReader.java | 32 +-
.../apache/hudi/source/reader/RecordLimiter.java | 83 --
.../function/AbstractSplitReaderFunction.java | 67 -
.../function/HoodieCdcSplitReaderFunction.java | 815 +++++++++++-
.../reader/function/HoodieSplitReaderFunction.java | 32 +-
.../source/split/assign/HoodieSplitAssigners.java | 2 +-
.../split/assign/HoodieSplitBucketAssigner.java | 23 +-
.../org/apache/hudi/table/HoodieTableSource.java | 7 +-
.../hudi/table/format/cdc/CdcImageManager.java | 197 ---
.../hudi/table/format/cdc/CdcInputFormat.java | 788 +++++++++++-
.../apache/hudi/table/format/cdc/CdcIterators.java | 667 ----------
.../sink/TestStreamWriteOperatorCoordinator.java | 28 +
.../org/apache/hudi/sink/TestWriteCopyOnWrite.java | 2 +-
.../org/apache/hudi/sink/TestWriteMergeOnRead.java | 51 -
.../TestAppendWriteFunctionWithBufferSort.java | 163 ---
.../hudi/sink/utils/MockStateSnapshotContext.java | 41 -
.../sink/utils/StreamWriteFunctionWrapper.java | 48 +-
.../apache/hudi/sink/utils/TestEventBuffers.java | 42 +
.../org/apache/hudi/sink/utils/TestWriteBase.java | 13 -
.../source/reader/TestHoodieSourceSplitReader.java | 198 +--
.../hudi/source/reader/TestRecordLimiter.java | 372 ------
.../function/TestAbstractSplitReaderFunction.java | 246 ----
.../function/TestHoodieCdcSplitReaderFunction.java | 63 +-
.../function/TestHoodieSplitReaderFunction.java | 58 -
.../split/TestDefaultHoodieSplitProvider.java | 3 +-
.../assign/TestHoodieSplitBucketAssigner.java | 401 +++---
.../apache/hudi/table/ITTestHoodieDataSource.java | 55 -
.../apache/hudi/avro/HoodieAvroWriteSupport.java | 43 +-
.../hudi/common/util/ParquetReaderIterator.java | 16 +-
.../hudi/hadoop/fs/HoodieWrapperFileSystem.java | 10 -
.../hudi/io/lance/HoodieBaseLanceWriter.java | 22 -
.../avro/AvroSchemaConverterWithTimestampNTZ.java | 1 -
...TestHoodieAvroWriteSupportVariantShredding.java | 508 ++++++++
.../hudi/common/table/TestHoodieTableConfig.java | 1 -
.../hudi/common/util/TestCompactionUtils.java | 315 -----
.../common/util/TestParquetReaderIterator.java | 2 +-
...oodieAvroFileWriterFactoryVariantShredding.java | 275 ++++
.../apache/hudi/common/metrics/LocalRegistry.java | 5 -
.../org/apache/hudi/common/metrics/Registry.java | 107 +-
hudi-spark-datasource/hudi-spark-common/pom.xml | 5 -
.../scala/org/apache/hudi/DataSourceOptions.scala | 26 +-
.../scala/org/apache/hudi/HoodieFileIndex.scala | 7 -
.../hudi/HoodieHadoopFsRelationFactory.scala | 5 +-
.../org/apache/hudi/IncrementalRelationV1.scala | 38 +-
.../org/apache/hudi/IncrementalRelationV2.scala | 38 +-
.../apache/hudi/SparkHoodieTableFileIndex.scala | 32 +-
.../hudi/metadata/CatalogBackedTableMetadata.scala | 123 --
.../apache/hudi/util/IncrementalRelationUtil.scala | 110 --
.../plans/logical/HoodieTableChanges.scala | 7 +-
.../HoodieVectorSearchTableValuedFunction.scala | 203 ---
.../spark/sql/hudi/HoodieSqlCommonUtils.scala | 71 +-
.../hudi/analysis/HoodieSparkBaseAnalysis.scala | 81 +-
.../analysis/HoodieVectorSearchPlanBuilder.scala | 329 -----
.../sql/hudi/analysis/TableValuedFunctions.scala | 12 +-
.../sql/hudi/analysis/VectorDistanceUtils.scala | 136 --
.../sql/hudi/streaming/HoodieStreamSourceV1.scala | 4 +-
.../sql/hudi/streaming/HoodieStreamSourceV2.scala | 4 +-
.../org/apache/spark/sql/types/BlobType.scala | 41 -
hudi-spark-datasource/hudi-spark/pom.xml | 5 -
.../io/storage/TestHoodieSparkLanceReader.java | 15 +-
.../io/storage/TestHoodieSparkLanceWriter.java | 153 +--
.../org/apache/hudi/blob/BlobTestHelpers.scala | 129 --
.../org/apache/hudi/blob/TestBlobSupport.scala | 142 --
.../apache/hudi/functional/TestCOWDataSource.scala | 71 -
.../TestHoodieVectorSearchFunction.scala | 1359 --------------------
.../TestIncrementalQueryColumnPruning.scala | 242 ----
.../hudi/functional/TestLanceDataSource.scala | 69 +-
.../apache/hudi/functional/TestMORDataSource.scala | 88 --
.../hudi/functional/TestVectorDataSource.scala | 26 -
.../spark/sql/avro/TestSchemaConverters.scala | 50 -
.../common/TestCatalogBackedTableMetadata.scala | 574 ---------
.../hudi/common/TestInstantTimeValidation.scala | 278 ----
.../spark/sql/hudi/ddl/TestCreateTable.scala | 124 --
.../sql/hudi/dml/schema/TestVariantDataType.scala | 300 ++++-
.../spark/sql/adapter/BaseSpark3Adapter.scala | 8 +
.../HoodieSpark3_3ExtendedSqlAstBuilder.scala | 29 +-
.../parser/HoodieSpark3_3ExtendedSqlParser.scala | 3 +-
.../HoodieSpark3_4ExtendedSqlAstBuilder.scala | 29 +-
.../parser/HoodieSpark3_4ExtendedSqlParser.scala | 3 +-
.../HoodieSpark3_5ExtendedSqlAstBuilder.scala | 29 +-
.../parser/HoodieSpark3_5ExtendedSqlParser.scala | 3 +-
.../spark/sql/adapter/BaseSpark4Adapter.scala | 27 +-
.../TestSpark4VariantShreddingProvider.java | 279 ++++
.../apache/spark/sql/avro/AvroDeserializer.scala | 23 +-
.../HoodieSpark4_0ExtendedSqlAstBuilder.scala | 29 +-
.../parser/HoodieSpark4_0ExtendedSqlParser.scala | 3 +-
.../TestHoodieRowParquetWriteSupportVariant.java | 2 +-
.../apache/hudi/utilities/HoodieClusteringJob.java | 68 +-
.../org/apache/hudi/utilities/UtilHelpers.java | 14 -
.../utilities/config/HoodieStreamerConfig.java | 12 -
.../apache/hudi/utilities/sources/RowSource.java | 5 +-
.../utilities/streamer/SourceFormatAdapter.java | 12 +-
.../apache/hudi/utilities/streamer/StreamSync.java | 8 +-
.../org/apache/hudi/utilities/TestUtilHelpers.java | 141 --
.../deltastreamer/TestSourceFormatAdapter.java | 94 --
.../offlinejob/TestHoodieClusteringJob.java | 151 ---
pom.xml | 6 -
146 files changed, 3860 insertions(+), 9968 deletions(-)
delete mode 100644
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/util/DistributedRegistryUtil.java
delete mode 100644
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/common/table/log/TestLogReaderUtils.java
delete mode 100644
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/metrics/TestDistributedRegistry.java
delete mode 100644
hudi-common/src/main/java/org/apache/hudi/util/PartitionPathFilterUtil.java
delete mode 100644
hudi-common/src/test/java/org/apache/hudi/util/TestPartitionPathFilterUtil.java
delete mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/RecordLimiter.java
delete mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/AbstractSplitReaderFunction.java
delete mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcImageManager.java
delete mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java
delete mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/MockStateSnapshotContext.java
delete mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/TestRecordLimiter.java
delete mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestAbstractSplitReaderFunction.java
create mode 100644
hudi-hadoop-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroWriteSupportVariantShredding.java
create mode 100644
hudi-hadoop-common/src/test/java/org/apache/hudi/io/storage/hadoop/TestHoodieAvroFileWriterFactoryVariantShredding.java
delete mode 100644
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/metadata/CatalogBackedTableMetadata.scala
delete mode 100644
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala
delete mode 100644
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/HoodieVectorSearchTableValuedFunction.scala
delete mode 100644
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieVectorSearchPlanBuilder.scala
delete mode 100644
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/VectorDistanceUtils.scala
delete mode 100644
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/types/BlobType.scala
delete mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/blob/BlobTestHelpers.scala
delete mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/blob/TestBlobSupport.scala
delete mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestHoodieVectorSearchFunction.scala
delete mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestIncrementalQueryColumnPruning.scala
delete mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestCatalogBackedTableMetadata.scala
delete mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestInstantTimeValidation.scala
create mode 100644
hudi-spark-datasource/hudi-spark4-common/src/test/java/org/apache/hudi/variant/TestSpark4VariantShreddingProvider.java