>From Preetham Poluparthi <[email protected]>:
Preetham Poluparthi has uploaded this change for review. (
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20788?usp=email )
Change subject: [ASTERIXDB-3392][EXT] Fix false warnings while querying parquet
......................................................................
[ASTERIXDB-3392][EXT] Fix false warnings while querying parquet
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
When querying Parquet files, we were seeing false warning counts, which this
patch fixes. It also corrects the Parquet file naming format. Previously, files
were named .parquet.zstd; they are now correctly named .zstd.parquet
Ext-ref: MB-70108
Change-Id: Id2dc25a30ea1bf7012f945803befc2751f33b86a
---
M
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/HDFSDataSourceFactory.java
M
asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/provider/ExternalWriterProvider.java
2 files changed, 7 insertions(+), 3 deletions(-)
git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb
refs/changes/88/20788/1
diff --git
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/HDFSDataSourceFactory.java
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/HDFSDataSourceFactory.java
index b820147..82653c2 100644
---
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/HDFSDataSourceFactory.java
+++
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/HDFSDataSourceFactory.java
@@ -321,9 +321,8 @@
}
restoreConfig(ctx);
JobConf readerConf = conf;
- if (ctx.getWarningCollector().shouldWarn()
- &&
configuration.get(ExternalDataConstants.KEY_INPUT_FORMAT.trim())
-
.equals(ExternalDataConstants.INPUT_FORMAT_PARQUET)) {
+ if
(configuration.get(ExternalDataConstants.KEY_INPUT_FORMAT.trim())
+ .equals(ExternalDataConstants.INPUT_FORMAT_PARQUET)) {
/*
* JobConf is used to pass warnings from the
ParquetReadSupport to ParquetReader. As multiple
* partitions can issue different warnings, we might have a
race condition on JobConf. Thus, we
diff --git
a/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/provider/ExternalWriterProvider.java
b/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/provider/ExternalWriterProvider.java
index bdfffa0..763a7a1 100644
---
a/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/provider/ExternalWriterProvider.java
+++
b/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/provider/ExternalWriterProvider.java
@@ -99,6 +99,11 @@
Map<String, String> configuration = sink.getConfiguration();
String format = getFormat(configuration);
String compression = getCompression(configuration);
+ if (format.equalsIgnoreCase(ExternalDataConstants.FORMAT_PARQUET)) {
+ // Parquet file extension format is like .snappy.parquet
+ return (compression.isEmpty() ? "" : compression.toLowerCase() +
".")
+ + ExternalDataConstants.FORMAT_PARQUET;
+ }
return format + (compression.isEmpty() ? "" : "." + compression);
}
--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20788?usp=email
To unsubscribe, or for help writing mail filters, visit
https://asterix-gerrit.ics.uci.edu/settings?usp=email
Gerrit-MessageType: newchange
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: Id2dc25a30ea1bf7012f945803befc2751f33b86a
Gerrit-Change-Number: 20788
Gerrit-PatchSet: 1
Gerrit-Owner: Preetham Poluparthi <[email protected]>