rdblue commented on a change in pull request #3293:
URL: https://github.com/apache/iceberg/pull/3293#discussion_r729972467
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -187,11 +215,6 @@ private void flushRowGroup(boolean finished) {
private void startRowGroup() {
Preconditions.checkState(!closed, "Writer is closed");
- try {
- this.nextRowGroupSize = Math.min(writer.getNextRowGroupSize(),
targetRowGroupSize);
- } catch (IOException e) {
- throw new RuntimeIOException(e);
- }
Review comment:
This looks suspicious. Why did you remove it?
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -96,21 +99,46 @@
this.model = (ParquetValueWriter<T>) createWriterFunc.apply(parquetSchema);
this.metricsConfig = metricsConfig;
this.columnIndexTruncateLength = conf.getInt(COLUMN_INDEX_TRUNCATE_LENGTH,
DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH);
+ this.writeMode = writeMode;
+ this.output = output;
+ this.conf = conf;
- try {
- this.writer = new ParquetFileWriter(ParquetIO.file(output, conf),
parquetSchema,
- writeMode, rowGroupSize, 0);
- } catch (IOException e) {
- throw new RuntimeIOException(e, "Failed to create Parquet file");
+ startRowGroup();
+
+ if (!lazy) {
+ this.getWriter();
}
+ }
- try {
- writer.start();
- } catch (IOException e) {
- throw new RuntimeIOException(e, "Failed to start Parquet file writer");
+ @SuppressWarnings("unchecked")
+ ParquetWriter(Configuration conf, OutputFile output, Schema schema, long
rowGroupSize,
+ Map<String, String> metadata,
+ Function<MessageType, ParquetValueWriter<?>> createWriterFunc,
+ CompressionCodecName codec,
+ ParquetProperties properties,
+ MetricsConfig metricsConfig,
+ ParquetFileWriter.Mode writeMode) {
+ this(conf, output, schema, rowGroupSize, metadata, createWriterFunc,
+ codec, properties, metricsConfig, writeMode, false);
+ }
+
+ private ParquetFileWriter getWriter() {
Review comment:
No need for `get` since this returns a writer. We avoid `get` because it
doesn't carry any useful information. In this case you can omit it but in most
you would want to use a more descriptive verb.
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -96,21 +99,46 @@
this.model = (ParquetValueWriter<T>) createWriterFunc.apply(parquetSchema);
this.metricsConfig = metricsConfig;
this.columnIndexTruncateLength = conf.getInt(COLUMN_INDEX_TRUNCATE_LENGTH,
DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH);
+ this.writeMode = writeMode;
+ this.output = output;
+ this.conf = conf;
- try {
- this.writer = new ParquetFileWriter(ParquetIO.file(output, conf),
parquetSchema,
- writeMode, rowGroupSize, 0);
- } catch (IOException e) {
- throw new RuntimeIOException(e, "Failed to create Parquet file");
+ startRowGroup();
+
+ if (!lazy) {
+ this.getWriter();
}
+ }
- try {
- writer.start();
- } catch (IOException e) {
- throw new RuntimeIOException(e, "Failed to start Parquet file writer");
+ @SuppressWarnings("unchecked")
+ ParquetWriter(Configuration conf, OutputFile output, Schema schema, long
rowGroupSize,
+ Map<String, String> metadata,
+ Function<MessageType, ParquetValueWriter<?>> createWriterFunc,
+ CompressionCodecName codec,
+ ParquetProperties properties,
+ MetricsConfig metricsConfig,
+ ParquetFileWriter.Mode writeMode) {
+ this(conf, output, schema, rowGroupSize, metadata, createWriterFunc,
+ codec, properties, metricsConfig, writeMode, false);
Review comment:
Nit: continuing indentation is 2 indents / 4 spaces, not 8.
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -96,21 +99,46 @@
this.model = (ParquetValueWriter<T>) createWriterFunc.apply(parquetSchema);
this.metricsConfig = metricsConfig;
this.columnIndexTruncateLength = conf.getInt(COLUMN_INDEX_TRUNCATE_LENGTH,
DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH);
+ this.writeMode = writeMode;
+ this.output = output;
+ this.conf = conf;
- try {
- this.writer = new ParquetFileWriter(ParquetIO.file(output, conf),
parquetSchema,
- writeMode, rowGroupSize, 0);
- } catch (IOException e) {
- throw new RuntimeIOException(e, "Failed to create Parquet file");
+ startRowGroup();
+
+ if (!lazy) {
+ this.getWriter();
}
+ }
- try {
- writer.start();
- } catch (IOException e) {
- throw new RuntimeIOException(e, "Failed to start Parquet file writer");
+ @SuppressWarnings("unchecked")
+ ParquetWriter(Configuration conf, OutputFile output, Schema schema, long
rowGroupSize,
+ Map<String, String> metadata,
+ Function<MessageType, ParquetValueWriter<?>> createWriterFunc,
+ CompressionCodecName codec,
+ ParquetProperties properties,
+ MetricsConfig metricsConfig,
+ ParquetFileWriter.Mode writeMode) {
+ this(conf, output, schema, rowGroupSize, metadata, createWriterFunc,
+ codec, properties, metricsConfig, writeMode, false);
+ }
+
+ private ParquetFileWriter getWriter() {
+ if (this.writer == null) {
+ try {
+ this.writer = new ParquetFileWriter(ParquetIO.file(output, conf),
parquetSchema,
+ writeMode, targetRowGroupSize, 0);
+ } catch (IOException e) {
+ throw new RuntimeIOException(e, "Failed to create Parquet file");
+ }
+
+ try {
+ writer.start();
+ } catch (IOException e) {
+ throw new RuntimeIOException(e, "Failed to start Parquet file writer");
+ }
}
- startRowGroup();
+ return this.writer;
Review comment:
No need to use `this.` when getting the value of a field, only when
setting a field.
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java
##########
@@ -183,6 +184,16 @@ public WriteBuilder writerVersion(WriterVersion version) {
return this;
}
+ public WriteBuilder lazy() {
+ this.lazyWriter = true;
+ return this;
+ }
+
+ public WriteBuilder lazy(boolean lazy) {
+ this.lazyWriter = lazy;
+ return this;
+ }
+
Review comment:
I don't think that it is worth exposing this as an option. We should
either lazily create the writer or always create it.
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java
##########
@@ -187,11 +215,6 @@ private void flushRowGroup(boolean finished) {
private void startRowGroup() {
Preconditions.checkState(!closed, "Writer is closed");
- try {
- this.nextRowGroupSize = Math.min(writer.getNextRowGroupSize(),
targetRowGroupSize);
- } catch (IOException e) {
- throw new RuntimeIOException(e);
- }
Review comment:
Thanks, sorry that I missed that originally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]