yap, also without frequencies, this should not be all that difficult (imho),
especially now when we have DocSetIdIterator as superclass, as a matter of fact
you could even today get DocSetIterator from TermDocs or whatever and use it as
Filter as a lightweight, in memory solution ... real solution would require
something like postings "type flag"
----- Original Message ----
From: robert engels <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Thursday, 7 February, 2008 7:43:33 PM
Subject: postings without position information ?
I
think
there
are
many
uses
of
Lucene
that
would
benefit
from
'enum'
fields,
aka
categories.
When
classifying
documents,
they
are
often
in
one
or
more
categories.
Lucene
could
write
these
posting
very
efficiently
using
VINT
and
RLE
(run
length
encoding)
if
the
positions
information
was
not
stored
(since
it
is
not
really
useful
in
these
typical
cases).
StartingDocNum|NumberOfDocuments...StartingDocNum|NumberOfDocuments
using
a
bit
of
the
StartingDocNum
to
know
if
it
was
a
series.
When
a
lot
of
documents
are
in
the
same
category,
and
they
are
added
as
the
same
time,
the
document
numbers
would
be
nearly
sequential,
allowing
very
efficient
compression.
Has
anyone
worked
on
this?
Our
previous
custom
IndexReaderWriter
supported
it,
and
I
was
wondering
if
this
has
made
it
into
the
core.
I
checked
the
docs/email
and
could
not
find
anything.
Thanks.
Robert
---------------------------------------------------------------------
To
unsubscribe,
e-mail:
[EMAIL PROTECTED]
For
additional
commands,
e-mail:
[EMAIL PROTECTED]
__________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]