Nick,

No problem. I didn't take the time to b a bit clearer. I didn't mean to
recompress the file. My understanding is that 
- He has to download new indices
- File size is not a concern

Based on that I would recommend to zip each index into a file, download,
uncompress, store the index somewhere and use the uncompressed index.

The other proposed solutions like having a compressed stream (I think this
will be very slow) or using SolFS (have you seen the price list?) sound
technically interesting but they seem way to complex for the task he wants
to accomplish. 

Maybe I am missing something?

Hans


-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:[email protected]] 
Sent: Friday, February 26, 2010 7:07 PM
To: [email protected]
Subject: RE: Lucene index file container

Hans,

        With all due respect, that's an unqualified statement.  What is the
gain in unzipping the file and rezipping it if file size is not a concern?
All you do is incur CPU time and I/O costs in doing so with no gains
whatsoever.

        Again, if you have a need to transport the index, then you should
perform the act of placing it in a container outside the scope of your
application, as you are more than likely going to transport the index less
than you are actually going to *use* the index.

                - Nick

-----Original Message-----
From: Hans Merkl [mailto:[email protected]] 
Sent: Friday, February 26, 2010 6:29 PM
To: [email protected]
Subject: RE: Lucene index file container

Then just unzip the index after downloading and store it in the application
directory. That's the best approach IMO.

-----Original Message-----
From: Andrew Schuler [mailto:[email protected]] 
Sent: Friday, February 26, 2010 6:18 PM
To: [email protected]
Subject: Re: Lucene index file container

Thanks for all the comments.
For what's it worth, for what I'm doing file size is not a concern. Index
performance is paramount. The index will be static, no adding or deleting,
its read only.


On Fri, Feb 26, 2010 at 2:49 PM, Nicholas Paldino [.NET/C# MVP] <
[email protected]> wrote:

> Andrew,
>
>        If you are going to unpack the index into a temp directory and then
> repack the file when you are done, then you are going to instantiate a
cost
> on startup and on teardown of the process which is mainly I/O and CPU
bound
> (I/O because you have to read the zip file from disk and then write the
> unpacked file from the zip to another location, and CPU bound because you
> are translating the byte stream while unpacking).
>
>        That approach doesn't do anything but add that additional I/O and
> CPU overhead on startup.  The "big win" for compressing the file is to
save
> space on disk, or whatever medium the byte stream is being persisted to.
>
>        If all you do is unzip the file in the beginning and zip it up at
> the end, then from your app's point of view, you do a lot of extra work
for
> nothing.  Unless you have real disk space issues, I'd recommend against
> this.
>
>        Now, if you were to create a new Directory class which uses a
> GZipStream or DeflateStream as a façade over the FileStream which writes
to
> disk, then you are reaping the benefits of compressing the file.  The
index
> will always be compressed on disk and you are realizing the gains.
>
>        The cost of doing this, however, is more CPU time (to perform the
> translation) but with a gain on less I/O operations to disk (since there
> are
> less bytes that are being written to disk).
>
>        Depending on how much activity you have on reading/writing to/from
> the index it might or might not make an impact.  You have to measure that
> yourself given your applications use of the index.
>
>        If file size is ^truly^ a concern, have you considered just setting
> the compression flag on the *folder* that contains the index files?  Any
> files that are added/updated/deleted will automatically be compressed if
> the
> flag is set on the folder, so doing it in code is busywork when the OS
> automatically provides it for you (assuming you are on Windows, which is a
> safe bet given you are running .NET, but not absolute, of course).
>
>                - Nick
>
> -----Original Message-----
> From: Andrew Schuler [mailto:[email protected]]
> Sent: Friday, February 26, 2010 4:48 PM
> To: [email protected]
> Subject: Re: Lucene index file container
>
> Thanks for both answers on this.
> I considered a zip file but was unsure of the associated overhead of
> unpacking file. Does any one have experience running an index directly out
> of zip file?
> Are my worries unfounded? I was just trying to leverage the experience of
> the group, but otherwise I'll just have to run some tests on my own.
>
>
>
> On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
> <[email protected]>wrote:
>
> > <Can anyone recommend a way to package the index into say some type of
> file
> > container>
> >
> > If I understand correctly, it sounds like your asking for a text-book
> > implementation of an archiver, like a zip file.  If so, DotNetZip is a
> > solid
> > product, very easy to use, very fast.  Highly recommended.
> > http://www.codeplex.com/DotNetZip.
> >
> > Best,
> > Nick
> >
> >
> >
> > On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <
> [email protected]
> > >wrote:
> >
> > > Yes, that is do-able. I was just thinking it would be cleaner to wrap
> the
> > > indexes (there will be more than one) in some sort of file container.
> One
> > > of
> > > the things I'd like to do it be able to allow the user to download
> > > pre-packaged indexes and load them into the app. This would be easy
> with
> > a
> > > file than a directory of files no?
> > >
> > >
> > > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <[email protected]> wrote:
> > >
> > > > Can't you add all the files in the index directory to the installer
> > > > package?
> > > > This should be pretty straightforward.
> > > >
> > > > -----Original Message-----
> > > > From: Andrew Schuler [mailto:[email protected]]
> > > > Sent: Friday, February 26, 2010 12:16 PM
> > > > To: [email protected]
> > > > Subject: Lucene index file container
> > > >
> > > > The discussion about encrypting an index has me thinking about a
> > current
> > > > use
> > > > I have for Lucene.net. I'm building a small app with a static index
> > > > distributed with it. Can anyone recommend a way to package the index
> > into
> > > > say some type of file container for inclusion in an installer
> package?
> > > >
> > > > -andy
> > > >
> > > >
> > > >
> > >
> >
>



Reply via email to