[
https://issues.apache.org/jira/browse/AVRO-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099366#comment-17099366
]
Hudson commented on AVRO-2802:
------------------------------
SUCCESS: Integrated in Jenkins build AvroJava #857 (See
[https://builds.apache.org/job/AvroJava/857/])
AVRO-2802: Pre-Size List in AvroInputFormat Avro File Lookup (#857) (github:
[https://github.com/apache/avro/commit/5f7b068663671bb1d4a810c35e3a4a55815bc1ef])
* (edit)
lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroInputFormat.java
> Pre-Size List in AvroInputFormat Avro File Lookup
> -------------------------------------------------
>
> Key: AVRO-2802
> URL: https://issues.apache.org/jira/browse/AVRO-2802
> Project: Apache Avro
> Issue Type: Improvement
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
> Fix For: 1.10.0
>
>
> {code:java}
> if (job.getBoolean(IGNORE_FILES_WITHOUT_EXTENSION_KEY,
> IGNORE_INPUTS_WITHOUT_EXTENSION_DEFAULT)) {
> List<FileStatus> result = new ArrayList<>();
> for (FileStatus file : super.listStatus(job))
> if (file.getPath().getName().endsWith(AvroOutputFormat.EXT))
> result.add(file);
> return result.toArray(new FileStatus[0]);
> } else {
> return super.listStatus(job);
> }
> {code}
> When a user runs an Avro MR job against a directory, it silently filters out
> files without an avro file extension. Fair enough. However, anecdotally,
> this is the primary use scenario, so this code probably does not filter out
> many files.
> I suggest that this {{ArrayList}} be pre-sized. If there are a lot of files,
> and all of them have the avro file extension (base case), this {{ArrayList}}
> will had to be expanded multiple times (time and GC). If there is a large
> list and it gets filtered down a lot, a few hundred bytes are wasted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)