[
https://issues.apache.org/jira/browse/LUCENE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-6241:
--------------------------------
Attachment: LUCENE-6241-alternative.patch
Here is the alternative FSDir patch. For lucene, this is fine.
But I am hesitant, because I think the ability to explicitly check for a
subdirectory might be needed, even though its not needed by lucene. My main
concern is honestly .DS_Store and things like that. In this case its not really
an abuse case, but the user is a victim.
On the other hand, things needing to filter out trash can do this stuff cleanly
already. For example lucene's replication module only replicates index files
matching IndexFileNames.CODEC_PATTERN. This will also take care of windows
thumbs.db or whatever too. So maybe this patch is fine, and this should be the
recommended approach?
> don't filter subdirectories in listAll()
> ----------------------------------------
>
> Key: LUCENE-6241
> URL: https://issues.apache.org/jira/browse/LUCENE-6241
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-6241-alternative.patch, LUCENE-6241.patch,
> LUCENE-6241.patch
>
>
> The issue is, today this means listAll() is always slow, sometimes MUCH
> slower, because it must do the fstat()-equivalent of each file to check if
> its a directory to filter it out.
> When i benchmarked this on a fast filesystem, doing all these filesystem
> metadata calls only makes listAll() 2.6x slower, but on a non-ssd, slower
> i/o, it can be more than 60x slower.
> Lucene doesn't make subdirectories, so hiding these for abuse cases just
> makes real use cases slower.
> To add insult to injury, most code (e.g. all of lucene except for where
> RAMDir copies from an FSDir) does not actually care if extraneous files are
> directories or not.
> Finally it sucks the name is listAll() when it is doing anything but that.
> I really hate to add a method here to deal with this abusive stuff, but I'd
> rather add isDirectory(String) for the rare code that wants to filter out,
> than just let stuff always be slow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]