[
https://issues.apache.org/jira/browse/AVRO-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099310#comment-17099310
]
ASF subversion and git services commented on AVRO-2802:
-------------------------------------------------------
Commit 5f7b068663671bb1d4a810c35e3a4a55815bc1ef in avro's branch
refs/heads/master from belugabehr
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=5f7b068 ]
AVRO-2802: Pre-Size List in AvroInputFormat Avro File Lookup (#857)
Co-authored-by: David Mollitor <[email protected]>
> Pre-Size List in AvroInputFormat Avro File Lookup
> -------------------------------------------------
>
> Key: AVRO-2802
> URL: https://issues.apache.org/jira/browse/AVRO-2802
> Project: Apache Avro
> Issue Type: Improvement
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
>
> {code:java}
> if (job.getBoolean(IGNORE_FILES_WITHOUT_EXTENSION_KEY,
> IGNORE_INPUTS_WITHOUT_EXTENSION_DEFAULT)) {
> List<FileStatus> result = new ArrayList<>();
> for (FileStatus file : super.listStatus(job))
> if (file.getPath().getName().endsWith(AvroOutputFormat.EXT))
> result.add(file);
> return result.toArray(new FileStatus[0]);
> } else {
> return super.listStatus(job);
> }
> {code}
> When a user runs an Avro MR job against a directory, it silently filters out
> files without an avro file extension. Fair enough. However, anecdotally,
> this is the primary use scenario, so this code probably does not filter out
> many files.
> I suggest that this {{ArrayList}} be pre-sized. If there are a lot of files,
> and all of them have the avro file extension (base case), this {{ArrayList}}
> will had to be expanded multiple times (time and GC). If there is a large
> list and it gets filtered down a lot, a few hundred bytes are wasted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)