Andrew,
If you are going to unpack the index into a temp directory and then
repack the file when you are done, then you are going to instantiate a cost
on startup and on teardown of the process which is mainly I/O and CPU bound
(I/O because you have to read the zip file from disk and then write the
unpacked file from the zip to another location, and CPU bound because you
are translating the byte stream while unpacking).
That approach doesn't do anything but add that additional I/O and
CPU overhead on startup. The "big win" for compressing the file is to save
space on disk, or whatever medium the byte stream is being persisted to.
If all you do is unzip the file in the beginning and zip it up at
the end, then from your app's point of view, you do a lot of extra work for
nothing. Unless you have real disk space issues, I'd recommend against
this.
Now, if you were to create a new Directory class which uses a
GZipStream or DeflateStream as a façade over the FileStream which writes to
disk, then you are reaping the benefits of compressing the file. The index
will always be compressed on disk and you are realizing the gains.
The cost of doing this, however, is more CPU time (to perform the
translation) but with a gain on less I/O operations to disk (since there are
less bytes that are being written to disk).
Depending on how much activity you have on reading/writing to/from
the index it might or might not make an impact. You have to measure that
yourself given your applications use of the index.
If file size is ^truly^ a concern, have you considered just setting
the compression flag on the *folder* that contains the index files? Any
files that are added/updated/deleted will automatically be compressed if the
flag is set on the folder, so doing it in code is busywork when the OS
automatically provides it for you (assuming you are on Windows, which is a
safe bet given you are running .NET, but not absolute, of course).
- Nick
-----Original Message-----
From: Andrew Schuler [mailto:[email protected]]
Sent: Friday, February 26, 2010 4:48 PM
To: [email protected]
Subject: Re: Lucene index file container
Thanks for both answers on this.
I considered a zip file but was unsure of the associated overhead of
unpacking file. Does any one have experience running an index directly out
of zip file?
Are my worries unfounded? I was just trying to leverage the experience of
the group, but otherwise I'll just have to run some tests on my own.
On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
<[email protected]>wrote:
> <Can anyone recommend a way to package the index into say some type of
file
> container>
>
> If I understand correctly, it sounds like your asking for a text-book
> implementation of an archiver, like a zip file. If so, DotNetZip is a
> solid
> product, very easy to use, very fast. Highly recommended.
> http://www.codeplex.com/DotNetZip.
>
> Best,
> Nick
>
>
>
> On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <[email protected]
> >wrote:
>
> > Yes, that is do-able. I was just thinking it would be cleaner to wrap
the
> > indexes (there will be more than one) in some sort of file container.
One
> > of
> > the things I'd like to do it be able to allow the user to download
> > pre-packaged indexes and load them into the app. This would be easy with
> a
> > file than a directory of files no?
> >
> >
> > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <[email protected]> wrote:
> >
> > > Can't you add all the files in the index directory to the installer
> > > package?
> > > This should be pretty straightforward.
> > >
> > > -----Original Message-----
> > > From: Andrew Schuler [mailto:[email protected]]
> > > Sent: Friday, February 26, 2010 12:16 PM
> > > To: [email protected]
> > > Subject: Lucene index file container
> > >
> > > The discussion about encrypting an index has me thinking about a
> current
> > > use
> > > I have for Lucene.net. I'm building a small app with a static index
> > > distributed with it. Can anyone recommend a way to package the index
> into
> > > say some type of file container for inclusion in an installer package?
> > >
> > > -andy
> > >
> > >
> > >
> >
>
smime.p7s
Description: S/MIME cryptographic signature
