[
https://issues.apache.org/jira/browse/AVRO-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fokko Driesprong resolved AVRO-2802.
------------------------------------
Fix Version/s: 1.10.0
Resolution: Fixed
> Pre-Size List in AvroInputFormat Avro File Lookup
> -------------------------------------------------
>
> Key: AVRO-2802
> URL: https://issues.apache.org/jira/browse/AVRO-2802
> Project: Apache Avro
> Issue Type: Improvement
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
> Fix For: 1.10.0
>
>
> {code:java}
> if (job.getBoolean(IGNORE_FILES_WITHOUT_EXTENSION_KEY,
> IGNORE_INPUTS_WITHOUT_EXTENSION_DEFAULT)) {
> List<FileStatus> result = new ArrayList<>();
> for (FileStatus file : super.listStatus(job))
> if (file.getPath().getName().endsWith(AvroOutputFormat.EXT))
> result.add(file);
> return result.toArray(new FileStatus[0]);
> } else {
> return super.listStatus(job);
> }
> {code}
> When a user runs an Avro MR job against a directory, it silently filters out
> files without an avro file extension. Fair enough. However, anecdotally,
> this is the primary use scenario, so this code probably does not filter out
> many files.
> I suggest that this {{ArrayList}} be pre-sized. If there are a lot of files,
> and all of them have the avro file extension (base case), this {{ArrayList}}
> will had to be expanded multiple times (time and GC). If there is a large
> list and it gets filtered down a lot, a few hundred bytes are wasted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)