[ 
https://issues.apache.org/jira/browse/LUCENE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318443#comment-14318443
 ] 

Robert Muir commented on LUCENE-6241:
-------------------------------------

{quote}
Maybe we should simply remove RAMDir's copy ctor? That method seems 
abusive/trappy to me too!
{quote}

Well, I think there is a real use case for this? Despite how bad it might 
perform, some people want to load their entire index into memory.

I would greatly prefer if we had an option to mmap that called 
MappedByteBuffer.force() or something for these users, this one will even do 
the correct madvise() call and read one byte for each page and do this 
essentially as efficiently as the OS can. I think its a better solution for 
that use case.

In general, for this issue, I wanted to avoid controversies or larger changes 
like this, my problem is one with operations on Directory API not being 
transparent, I want to make some progress on this and I think they should map 
directly to what is happening on the filesystem. Thats easiest to understand 
and the least trappy.
* listAll() should just list files, not readdir() + fstat()
* rename() should just rename() and not rename() + fsync(dir)
* createOutput should just createOutput, not delete() + create()


> don't filter subdirectories in listAll()
> ----------------------------------------
>
>                 Key: LUCENE-6241
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6241
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-6241.patch, LUCENE-6241.patch
>
>
> The issue is, today this means listAll() is always slow, sometimes MUCH 
> slower, because it must do the fstat()-equivalent of each file to check if 
> its a directory to filter it out.
> When i benchmarked this on a fast filesystem, doing all these filesystem 
> metadata calls only makes listAll() 2.6x slower, but on a non-ssd, slower 
> i/o, it can be more than 60x slower.
> Lucene doesn't make subdirectories, so hiding these for abuse cases just 
> makes real use cases slower.
> To add insult to injury, most code (e.g. all of lucene except for where 
> RAMDir copies from an FSDir) does not actually care if extraneous files are 
> directories or not.
> Finally it sucks the name is listAll() when it is doing anything but that.
> I really hate to add a method here to deal with this abusive stuff, but I'd 
> rather add isDirectory(String) for the rare code that wants to filter out, 
> than just let stuff always be slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to