[ 
https://issues.apache.org/jira/browse/AVRO-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099366#comment-17099366
 ] 

Hudson commented on AVRO-2802:
------------------------------

SUCCESS: Integrated in Jenkins build AvroJava #857 (See 
[https://builds.apache.org/job/AvroJava/857/])
AVRO-2802: Pre-Size List in AvroInputFormat Avro File Lookup (#857) (github: 
[https://github.com/apache/avro/commit/5f7b068663671bb1d4a810c35e3a4a55815bc1ef])
* (edit) 
lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroInputFormat.java


> Pre-Size List in AvroInputFormat Avro File Lookup
> -------------------------------------------------
>
>                 Key: AVRO-2802
>                 URL: https://issues.apache.org/jira/browse/AVRO-2802
>             Project: Apache Avro
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>             Fix For: 1.10.0
>
>
> {code:java}
>     if (job.getBoolean(IGNORE_FILES_WITHOUT_EXTENSION_KEY, 
> IGNORE_INPUTS_WITHOUT_EXTENSION_DEFAULT)) {
>       List<FileStatus> result = new ArrayList<>();
>       for (FileStatus file : super.listStatus(job))
>         if (file.getPath().getName().endsWith(AvroOutputFormat.EXT))
>           result.add(file);
>       return result.toArray(new FileStatus[0]);
>     } else {
>       return super.listStatus(job);
>     }
> {code}
> When a user runs an Avro MR job against a directory, it silently filters out 
> files without an avro file extension. Fair enough.  However, anecdotally, 
> this is the primary use scenario, so this code probably does not filter out 
> many files.
> I suggest that this {{ArrayList}} be pre-sized.  If there are a lot of files, 
> and all of them have the avro file extension (base case), this {{ArrayList}} 
> will had to be expanded multiple times (time and GC).  If there is a large 
> list and it gets filtered down a lot, a few hundred bytes are wasted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to