[
https://issues.apache.org/jira/browse/BEAM-6526?focusedWorklogId=192267&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192267
]
ASF GitHub Bot logged work on BEAM-6526:
----------------------------------------
Author: ASF GitHub Bot
Created on: 30/Jan/19 13:18
Start Date: 30/Jan/19 13:18
Worklog Time Spent: 10m
Work Description: iemejia commented on pull request #7672: [BEAM-6526]
Add ReadFiles transform for AvroIO
URL: https://github.com/apache/beam/pull/7672#discussion_r252251142
##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
##########
@@ -297,6 +297,23 @@
.build();
}
+ /**
+ * Like {@link #read}, but reads each file in a {@link PCollection} of {@link
+ * FileIO.ReadableFile}, returned by {@link FileIO#readMatches}.
+ */
+ public static <T> ReadFiles<T> readFiles(Class<T> recordClass) {
+ return new AutoValue_AvroIO_ReadFiles.Builder<T>()
+
.setMatchConfiguration(MatchConfiguration.create(EmptyMatchTreatment.ALLOW_IF_WILDCARD))
+ .setRecordClass(recordClass)
+ .setSchema(ReflectData.get().getSchema(recordClass))
+ .setInferBeamSchema(false)
+ // 64MB is a reasonable value that allows to amortize the cost of
opening files,
+ // but is not so large as to exhaust a typical runner's maximum amount
of output per
+ // ProcessElement call.
+ .setDesiredBundleSizeBytes(64 * 1024 * 1024L)
Review comment:
Yes good idea, I will extract the constant and move the doc there.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 192267)
Time Spent: 0.5h (was: 20m)
> Add ReadFiles transform for AvroIO
> ----------------------------------
>
> Key: BEAM-6526
> URL: https://issues.apache.org/jira/browse/BEAM-6526
> Project: Beam
> Issue Type: Improvement
> Components: io-java-avro
> Reporter: Ismaël Mejía
> Assignee: Ismaël Mejía
> Priority: Minor
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> AvroIO lacks the `readFiles()` method to make it fully composable with FileIO
> as other file based IOs do, e.g. TextIO, ParquetIO.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)