voonhous commented on code in PR #18837:
URL: https://github.com/apache/hudi/pull/18837#discussion_r3347310213
##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiPageSource.java:
##########
@@ -115,8 +132,14 @@ public long getMemoryUsage()
public void close()
throws IOException
{
- fileGroupReader.close();
- pageSource.close();
+ // Closer attempts every close and aggregates failures via
addSuppressed, rethrowing the
+ // first. Registration is LIFO, so registering in reverse gives the
original close order:
+ // recordIterator (wraps the readers) first, then fileGroupReader,
then pageSource.
+ try (Closer closer = Closer.create()) {
+ closer.register(pageSource::close);
+ closer.register(fileGroupReader::close);
+ closer.register(recordIterator::close);
Review Comment:
Good catch. Switched `close()` to register only `recordIterator::close` in
the `Closer`. It's the outermost wrapper (`CloseableMappingIterator` ->
`HoodieFileGroupReaderIterator` -> `fileGroupReader.close()` ->
`baseFileIterator.close()`, which closes the Trino `pageSource`, plus
`recordBuffer.close()`), so every underlying resource is now closed exactly
once instead of 2-3x. Addressed in cce7bef.
##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiUtil.java:
##########
@@ -78,10 +92,14 @@
import static io.trino.plugin.hudi.HudiErrorCode.HUDI_SCHEMA_ERROR;
import static io.trino.plugin.hudi.HudiErrorCode.HUDI_UNSUPPORTED_FILE_FORMAT;
import static java.lang.Math.toIntExact;
-import static org.apache.hudi.common.model.HoodieRecord.HOODIE_META_COLUMNS;
+import static
org.apache.hudi.common.model.HoodieRecord.PARTITION_PATH_METADATA_FIELD;
+import static
org.apache.hudi.common.model.HoodieRecord.RECORD_KEY_METADATA_FIELD;
public final class HudiUtil
Review Comment:
Renamed `HOODIE_META_COLUMNS` -> `HUDI_REQUIRED_META_COLUMNS` to make
explicit it's the minimal merge-required subset (record-key + partition-path),
distinct from the upstream 5-field `HoodieRecord.HOODIE_META_COLUMNS`. Updated
both usages in `HudiUtil`; the `HudiTestUtils` static import is the upstream
5-field constant, so it's left as-is. Addressed in 557bc9e.
##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiUtil.java:
##########
@@ -397,4 +410,53 @@ public static Schema
getLatestTableSchema(HoodieTableMetaClient metaClient, Stri
throw new TrinoException(HUDI_FILESYSTEM_ERROR, e);
}
}
+
+ public static List<HiveColumnHandle> getOrderingColumnHandles(Table table,
TypeManager typeManager, Lazy<HoodieTableMetaClient> lazyMetaClient,
HiveTimestampPrecision timestampPrecision)
+ {
+ RecordMergeMode recordMergeMode =
lazyMetaClient.get().getTableConfig().getRecordMergeMode();
+ if (recordMergeMode == null || recordMergeMode ==
RecordMergeMode.COMMIT_TIME_ORDERING) {
+ // if commit time ordering is enabled, return empty list
+ return Collections.emptyList();
+ }
+
+ ImmutableList.Builder<HiveColumnHandle> columns =
ImmutableList.builder();
+ List<String> orderingColumnNames =
lazyMetaClient.get().getTableConfig().getOrderingFields();
+
+ int hiveColumnIndex = 0;
+ for (Column field : table.getDataColumns()) {
+ // ignore unsupported types rather than failing
+ if (orderingColumnNames.contains(field.getName())) {
+ HiveType hiveType = field.getType();
+ if (typeSupported(hiveType.getTypeInfo(),
table.getStorage().getStorageFormat())) {
+ columns.add(createBaseColumn(field.getName(),
hiveColumnIndex, hiveType, getType(hiveType, typeManager, timestampPrecision),
REGULAR, field.getComment()));
+ }
+ }
+ hiveColumnIndex++;
+ }
+
+ return columns.build();
+ }
+
+ /**
+ * Converts the given {@link HoodiePairData} into a {@link Map}.
+ * <p>
+ * Special handling is applied for null keys:
+ * <ul>
+ * <li>If a key is null, it is stored in the map as a {@code null}
entry.</li>
+ * <li>If multiple entries share the same key (including null), the
latest value overwrites the previous one.</li>
+ * </ul>
+ *
+ * @param pairData the HoodiePairData containing key-value pairs
+ * @param <K> the type of keys maintained by the resulting map
+ * @param <V> the type of mapped values
+ * @return a {@link Map} containing all key-value pairs from the input data
+ */
+ public static <K, V> Map<K, V> collectAsMap(HoodiePairData<K, V> pairData)
+ {
+ // HashMap allows null keys, so collect directly; on duplicate keys
the later entry wins.
+ return pairData.collectAsList().stream()
Review Comment:
Switched the three-arg `Stream.collect` to a plain `HashMap` + `forEach` as
suggested - clearer for a put-each-pair loop, same semantics (null keys
allowed, last write wins). Addressed in 9336b06.
##########
pom.xml:
##########
@@ -130,6 +129,12 @@
<hive.avro.version>1.11.4</hive.avro.version>
<presto.version>0.273</presto.version>
<trino.version>390</trino.version>
+ <!-- Trino SPI version that hudi-trino-plugin main code compiles against.
-->
+ <trino.connector.version>482-SNAPSHOT</trino.connector.version>
+ <!-- Trino version used only for *-tests.jar classifier deps. Trino does
not publish
+ test-jars for tagged releases, so this tracks a snapshot. Built
locally via the
Review Comment:
Resolved. Both `trino.connector.version` and `trino.connector.test.version`
are pinned to the released `481` now - no more 480/482 SNAPSHOTs and no
main-vs-test divergence. The main Trino artifacts (trino-spi, etc.) at 481 are
on Maven Central; the test-jars (trino-spi/-filesystem/-hive/-main test
classifiers) aren't published anywhere, so the Hudi Trino CI checks out the
`trinodb/trino` `481` tag and installs just those test-jars into the local m2
before building the connector (`.github/workflows/hudi_trino_ci.yml`).
Downstream contributors running the plugin tests do the same via the documented
build steps.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]