Hi,

Wide characters are stored using utf-8, therefore characters taking up
less than 7 bytes will take up the exact same about of space. i would
stick to the wide character format if you don't have a compelling
reason to use ascii.

reducing size: using Luke(http://code.google.com/p/luke/) will help
you figure out what you've actually stored in the index. reducing the
number of fields you 'STORE' helps a lot. There is a compressed type
field in clucene - but it was a bit hard to get going until the latest
versions (now it's just another flag - Field::STORE_COMPRESS).

ben

On Wed, Jun 8, 2011 at 2:07 AM, Teryl Taylor <teryl.tay...@gmail.com> wrote:
> Hi everyone,
>
> I just had a quick question about search engine size. The search engine
> takes everything as wide characters.  Since everything I'm putting in the
> database is ASCII, I thought I'd compile the search engine  with ASCII Mode
> on.  This took the TCHAR and defined it as a char rather than a wchar_t.
> When I recompiled everything, and ran it, the search engine database was the
> exact same size as the original wide char.    Anyone know why that is?  I
> would have thought using chars instead of wide chars would have reduced the
> size.  Am I missing a configuration?
>
> Also, does anyone have any tips on reducing the size of a search engine?
> Lucene doesn't support a compression mechanism right?   It's not that the
> database is bloated or anything, it's just any size reduction I can get is
> beneficial, so I'm just investigating ways to get it as small as possible.
>
>
> Thanks,
>
> Teryl
>
>
>
>
>
> ------------------------------------------------------------------------------
> EditLive Enterprise is the world's most technically advanced content
> authoring tool. Experience the power of Track Changes, Inline Image
> Editing and ensure content is compliant with Accessibility Checking.
> http://p.sf.net/sfu/ephox-dev2dev
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>
>



-- 
-------------------------------------
Ben van Klinken

Mob: 0401 921847
Em: b...@villagechief.com

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to