[
https://issues.apache.org/jira/browse/LUCENE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318443#comment-14318443
]
Robert Muir commented on LUCENE-6241:
-------------------------------------
{quote}
Maybe we should simply remove RAMDir's copy ctor? That method seems
abusive/trappy to me too!
{quote}
Well, I think there is a real use case for this? Despite how bad it might
perform, some people want to load their entire index into memory.
I would greatly prefer if we had an option to mmap that called
MappedByteBuffer.force() or something for these users, this one will even do
the correct madvise() call and read one byte for each page and do this
essentially as efficiently as the OS can. I think its a better solution for
that use case.
In general, for this issue, I wanted to avoid controversies or larger changes
like this, my problem is one with operations on Directory API not being
transparent, I want to make some progress on this and I think they should map
directly to what is happening on the filesystem. Thats easiest to understand
and the least trappy.
* listAll() should just list files, not readdir() + fstat()
* rename() should just rename() and not rename() + fsync(dir)
* createOutput should just createOutput, not delete() + create()
> don't filter subdirectories in listAll()
> ----------------------------------------
>
> Key: LUCENE-6241
> URL: https://issues.apache.org/jira/browse/LUCENE-6241
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-6241.patch, LUCENE-6241.patch
>
>
> The issue is, today this means listAll() is always slow, sometimes MUCH
> slower, because it must do the fstat()-equivalent of each file to check if
> its a directory to filter it out.
> When i benchmarked this on a fast filesystem, doing all these filesystem
> metadata calls only makes listAll() 2.6x slower, but on a non-ssd, slower
> i/o, it can be more than 60x slower.
> Lucene doesn't make subdirectories, so hiding these for abuse cases just
> makes real use cases slower.
> To add insult to injury, most code (e.g. all of lucene except for where
> RAMDir copies from an FSDir) does not actually care if extraneous files are
> directories or not.
> Finally it sucks the name is listAll() when it is doing anything but that.
> I really hate to add a method here to deal with this abusive stuff, but I'd
> rather add isDirectory(String) for the rare code that wants to filter out,
> than just let stuff always be slow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]