I second the motion to have a place to store contributed Document
generators.
I've developed an HTML file handler that creates a Document using JTidy
under the covers to DOM'ify it and pull out only the non-HTML tagged text
into a content field and strips the title out as a separate field. It
I would go so far as to even recommend that javacc.home be defined in
build.xml with a default location that is the recommend home for it and if
it needs adjusting it would happen via build.properties or -D command-line
switch.
Again, I'm here to help and volunteer my Ant expertise to
Lucene and I've seen messages from
other users with the same problem). Can you make the changes and submit
the patches? Then, if everybody agrees, they can be
commited.
--Daniel
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]]
Sent: sábado, 16 de fevereiro de 2002
HttpUnit (which uses JTidy under the covers) makes childs play out of
pulling out links and navigating to them.
The only caveat (and this would be true for practically all tools, I
suspect) is that the HTML has to be relatively well-formed for it to work
well. JTidy can be somewhat forgiving
bigger) than your original data
set.
--Peter
On 5/20/02 4:16 PM, Erik Hatcher [EMAIL PROTECTED] wrote:
I'm indexing 900+ files (less than 1,000) that total about 15MB in size.
These are text files and HTML files. I only index them into a few
fields
(title, content, filename). My index
What build system are you using? Why can't you just blindly pick up the
entire index directory and incorporate that? This is what I do with a
webapp build that indexes documents during the build and rolls them into the
WAR file. All using Ant, of course! :)
Erik
- Original Message
The application we built for our book (Java Development with Ant -
http://www.manning.com/antbook/) uses Lucene to build an index from an Ant
build (think static documentation here) and then was incorporated in a few
different environments:
- command-line query tool
- Ant query task
Like a charm.
- Original Message -
From: Rakesh Ayilliath [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 03, 2002 12:38 AM
Subject: Re: Wildcard searching
Dear All,
How well does Lucene integrate with Struts ?
regards,
Rakesh.
Subject: Re: Ant index task
Erik,
Does this require the latest and greatest Ant 1.5*? I've got 1.4.1.
BUILD FAILED
/home/otis/cvs-repositories/jakarta/jakarta-lucene-sandbox/projects/ant/buil
d.xml:72:
The jar task doesn't support the destfile attribute.
Thanks,
Otis
--- Erik Hatcher
Ok, its been relocated to contributions/ant in the the sandbox, and the
jar task fixed to work with pre-Ant 1.5 (sorry, I'm always running a
self-built latest greatest version of Ant :)
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED
I searched the archives, but may have missed it. I suspect someone has
done this before:
How can I read a Lucene index that is stored within a JAR file rather
than directly on the file system? I want to integrate a read-only index
into a WAR and an EJB environment which has the easiest
Tim Dawson wrote:
I need to do almost exactly the same thing as Erik - create a read-only
index on our help webapp that will be packaged inside an ear file.
Eventually I'll have a look at implementing this (and of course
contributing it back to Lucene's codebase) - its on my to-do list. But
If you have a look at the HtmlDocument class in the ant contributions
directory of jakarta-lucene-sandbox in Jakarta's CVS.
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/ant/src/main/org/apache/lucene/ant/HtmlDocument.java?annotate=1.1
I wrote this and it uses JTidy to
Look in the Lucene sandbox in CVS. I contributed an Ant task that
indexed HTML documents. It uses JTidy under the covers to parse HTML
into title and body content, and it could be extended to pull other
information such meta keywords.
Erik
Leo Galambos wrote:
So, I have tried this with
information.
Let me know if you experience any issues with it, or have comments.
Erik
Erik Hatcher wrote:
Look in the Lucene sandbox in CVS. I contributed an Ant task that
indexed HTML documents. It uses JTidy under the covers to parse HTML
into title and body content, and it could be extended
Is it possible to get a collection of documents based on whether they
have a particular field (regardless of value)? I'm indexing HTML
documents, and want to pull out some information that may or may not be
present in the documents (and adding a field if that information exists
but not
-
From: Erik Hatcher [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, December 30, 2002 12:24 PM
Subject: How to obtain unique field values
One final Lucene question for the day...
Is it possible for me to retrieve all the values of a particular field
that exists within an index, across all
a
different Analyzer (WhitespaceAnalyzer? Maybe you can try that one
before writing your own).
Otis
--- Erik Hatcher [EMAIL PROTECTED] wrote:
I have a keyword field that has a value like: /path/to/something.
Is
there a way I can use QueryParser to get documents that have that
field
value? It seems
Doug,
Your points are well taken and I appreciate your time in replying to
this. I'm on the same wavelength with this thinking about QueryParser,
and I realize I'm attempting to push it past it designed simplicity.
I'm not as knowledgeable (and who is?!) on Lucene's API and design as
you
Are folks using the SearchBean code found in the
contributions/searchbean area of the sandbox?
I've just spent a while attempting to get an example working with it
and I've ended up having to tweak the code to get it working partially.
First, I encountered a compilation error because of an
On Sunday, January 19, 2003, at 11:17 PM, Terry Steichen wrote:
I've recently made some specific additional changes (regarding
scoring, as I
recall). At this point I don't recall which of these have been
included in
the distribution and which have not.
I'm using the latest CVS HEAD version
Terry,
Thanks for sharing your changes.
I have commit access to the sandbox, so if I get into using the
HitsIterator more I'll be happy to commit these types of changes as
long as no other dev folks complain.
If you can find the time, please post your patches to Bugzilla for safe
keeping,
On Monday, January 20, 2003, at 07:33 PM, Terry Steichen wrote:
Erik,
You're welcome to the changes. But (a) I'm not sure the changes are
general
enough to appropriately be added to the code base
The current CVS of HitsIterator is hard-coded to a single field to sort
upon. Your code is
Unfortunately I don't believe date field range queries work with
QueryParser, or at least not human-readable dates.
Is that correct?
I think it supports date ranges if they are turned into a numeric
format, but no human would type that kind of query in. I'm sure
supporting true date range
-form dates (provide some type of UI element to enter year,
month,
day seperately) or give some copy stating dates should be in MMDD
format.
-Mike.
Erik Hatcher wrote:
Unfortunately I don't believe date field range queries work with
QueryParser, or at least not human-readable dates
human being, I
don't
know.)
Regards,
Terry
PS: Just to clarify, I believe that dates represented this way are
internally treated as strings by Lucene.
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, January 22, 2003 9:49
;
String end = 12/31/02;
And the results were the same, all expected documents were returned.
It does not work to use null for begin or end to leave either side of
the range open-ended.
Erik
On Wednesday, January 22, 2003, at 08:56 PM, Erik Hatcher wrote:
I wanted to see this first-hand, so I
So you are using 1.3dev1?
I'm not having the same effect you are - and looking at QueryParser.jj
it does this:
private Query getRangeQuery(String field,
Analyzer analyzer,
String part1,
String part2,
on this thread. Back to your regular
programming...
On Wednesday, January 22, 2003, at 11:28 PM, Erik Hatcher wrote:
Ah, maybe its how we are indexing our fields differently? How are you
indexing your my_date_field? I'm using this syntax:
Field.Keyword(fieldName, new Date())
Maybe you are indexing
Again my apologies to Terry who was patient and confident despite my
misunderstanding.
I saw a change earlier today (on lucene-dev) with the QueryParser
syntax updated with more details on the range query format. I think
the ambiguity of saying a date field is still there though, since the
If you look at the contributions/ant area of the Lucene sandbox in CVS
you'll see my HtmlDocument class which uses JTidy.
Rather than making up some invalid HTML tag, I'd recommend you separate
your navigation section with a div or span with a special
class=navigation or something like
On Thursday, January 30, 2003, at 06:59 PM, Michael Wechner wrote:
Well, I haven't found out how to use JTidy to ignore such tags that
have such a class.
You did it the way I envisioned. I did not expect JTidy to have a way
to ignore tags either, but rather having to do it the laborious way
My current day job project uses SQL Server (yes, Slammer hit several
folks, sheesh!) and it has built-in full-text indexing of IMAGE fields
that correspond to Office documents (or anything Index Server can
index, I suppose). It works very well and made our Word document
indexing issue a
The problem with that solution is the same as what the other thread
about OutOfMemory is discussing with wildcard queries.
Just prefixing something with a fixed query to 'hack' a wildcard query
could lead to performance/memory issues.
I recommend indexing the file extension (or mime type) as a
It will make multiple tokens within the same field just as if you
had used a tokenized field with all the values (and your analyzer split
them in the manner you're adding them). Does that make sense?
In other words, adding keywords foo and bar is the same as using a
text field as foo bar
Very well done
On Monday, July 14, 2003, at 11:47 AM, Andrzej Bialecki wrote:
Dear Lucene Users,
Luke is a diagnostic tool for Lucene
(http://jakarta.apache.org/lucene) indexes. It enables you to browse
documents in existing indexes, perform queries, navigate through
terms, optimize
On Friday, July 18, 2003, at 01:35 PM, Raible, Matt wrote:
I'm planning on having different index directories for each user, in
the
format indexDir/username. This will allow a single site to be
indexed, or I
can loop through all the users and index them all. We're going to
leave up
the
On Wednesday, July 30, 2003, at 06:16 AM, Gregor Heinrich wrote:
I would like to have unique term texts in my term enumeration. That is,
across all fields there should be no duplicate term text.
An easy solution would be to only use one field.
But does someone know an alternative way with
A Lucene Intro article I recently wrote for java.net has just
published:
http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html
Erik
p.s. Am I an official committer now with the repository enabled for
ehatcher? If so, then I'll commit this link to the resources section
of the site
I'd love to see there be quality implementations of the Lucene API in
other languages, that are up to date with the latest Java codebase.
I'm embarking on a Ruby port, which I'm hosting at rubyforge.org.
There is a Python version called Lupy.
A related question I have is what about
On Thursday, July 31, 2003, at 11:22 AM, Scott Ganyo wrote:
Do these implementations maintain file compatibility with the Java
version?
Lupy claims to. I don't know about NLucene, but it should. Any port
of Lucene to another language should (out of respect and common-sense)
maintain index
]msgId=763762
http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-
[EMAIL PROTECTED]msgId=774059
Another Eric, Erik Hatcher is active in both Ant and Lucene (as I found
out in a nofluffjuststuff.com talk) scene. So hopefully good things
will come about.
Hi Chris!
Yeah, I'll (eventually) have
QueryParser, at least in Lucene's CVS, supports an attribute to toggle
this capability.
QueryParser.setOperator(QueryParser.DEFAULT_OPERATOR_AND)
On Monday, August 4, 2003, at 06:19 PM, Sebastien X wrote:
Hi everybody,
When I make a search, for example : Jakarta lucene
The search engine
On Monday, August 11, 2003, at 11:12 AM, Günter Kukies wrote:
Hi,
nice Tool.
Here some points for further developments:
show the contents of a Reader-valued Field
But Reader-valued fields are not stored in the index
(Field.Text(name, Reader), that is). The only thing it could show is
the
You should re-index using 1.3 when you upgrade the library to 1.3. I
don't know the differences specifically (without doing some CVS diffs)
but there were some changes that affect the index format.
Erik
On Tuesday, August 5, 2003, at 11:48 AM, hui wrote:
Thank you, Sebastien. It works now.
As long as you finish any index operations when you checkpoint it,
there should be any problems. A Lucene index is just regular ol'
filesystem files just like everything else, so there is no problem
storing it off and using it later.
Erik
On Thursday, August 7, 2003, at 12:56 PM, Rob Outar
Its open-source... have a look yourself! :))
On Saturday, August 16, 2003, at 01:41 AM, Rociel Buico wrote:
hello,
I would like to ask of what algorithm or statistics that Hits.score()
method is using?
Is it bayesian?
tia,
buics
-
Do you Yahoo!?
The New
Using the QueryFilter would help with the refining a search based on
hits from a previous search, but it wouldn't help with the like part
your asked about.
I'm interested in what you turn up with this though.
Erik
On Monday, August 18, 2003, at 01:11 PM, Terry Steichen wrote:
Is it
On Tuesday, August 26, 2003, at 12:53 AM, Mark Woon wrote:
1) How can I search all fields at the same time? The QueryParser
seems to only search one specific field.
The common thing I've done and seen others do is glue all the fields
together into a master searchable field named something like
What you are doing looks fine to me. I'm sure these are obvious
questions, kinda like is your computer plugged in?, but here goes:
- How are you determining that the document is still there? With an
IndexReader? IndexSearcher?
- A freshly created (i.e. after the delete)
On Tuesday, September 2, 2003, at 04:11 PM, Joe Paulsen wrote:
It seems when I do a search such as covered wagon ~5 or the like,
the systems disregards the order of my terms. I.E., it will find
covered
within 5 of wagon and it will also find wagon within 5 of covered.
I wanted to see this in
Because I'm really interested in the guts of Lucene, I dug even
deeper
On Tuesday, September 2, 2003, at 07:39 PM, Erik Hatcher wrote:
Is there anyway to make the system respond only to the order of the
terms as entered in the query?
I'm sure there is a way to make an OrderedPhraseQuery
A couple of thoughts on this:
- Eclipse uses Lucene for its code indexing/searching (I learned this
at the OSCON Keynote by Eclipse folks). Perhaps looking at how Eclipse
does its thing would be useful even if not the solution.
- XDoclet could be used to sweep through Java code and build a
On Thursday, September 4, 2003, at 09:19 AM, petite_abeille wrote:
- XDoclet could be used to sweep through Java code and build a
text/XML file as richly as you'd like from the information there
(complete with JavaDoc tags, which Zapata will miss :)),
Correct. This happen to be on purpose :)
I replied to your previous message, but maybe it got lost in my e-mail
address shuffling
Have a look at QueryFilter. It will allow you to do search refining as
you desire.
Erik
On Friday, September 5, 2003, at 06:23 AM, Gabriel Boian wrote:
Because I haven't fount any solution I must
On Friday, September 5, 2003, at 02:36 PM, Chris Sibert wrote:
Synonym searching might be desirable, but now that I'm thinking about
it,
also likely not important.
This could be done with a custom Analyzer.
Associated Words - sounds very interesting, like 'gold' might return
'metal'
also, etc.
On Friday, September 5, 2003, at 07:45 PM, Leo Galambos wrote:
And for the second time today QueryFilter. It allows narrowing
the documents queried to only the documents from a previous Query.
I guess, it would not be an ideal solution - the first query does two
things a) it selects a
Which Analyzer are you using?
Look at the Analysis section of my article here:
http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html
It will help you construct a simple test to see what your Analyzer is
or is not doing.
Erik
On Monday, September 8, 2003, at 12:55 PM, Onglatco,
I've been lurking on these related threads to see what others came up
with.
The real issue here is defining what the content really is. If the
web application is solely your own creation, then I feel crawling or
weird magical tricks with Struts/Tiles/taglibs is not the proper
approach.
On Friday, September 12, 2003, at 02:52 PM, Robert Taylor wrote:
I agree that this makes indexing rather straight forward, but
then I have to build/use a content management system for my
existing web application(s). That's not going to fly for me right now.
Maybe in the future - but for now,
On Friday, September 12, 2003, at 09:05 PM, Marco Tedone wrote:
Hi, I was wondering why Hits is not serializable?
Hits is really a collection of pointers back to the documents, not a
standalone collection.
Which is your standard way to pass the matching documents between
different
context?
Well done!
On Friday, September 12, 2003, at 10:16 PM, Marco Tedone wrote:
Hi , I would like to share with you the solution I chose at the end to
create a search facility for my Struts application. Basically, I
followed
the Erik's suggestion to realize something independent from the actual
On Saturday, September 13, 2003, at 11:08 AM, Eric Jain wrote:
and I'm very particular
about URL's and how it maps to content
I, too, am very particular about that.
A bit off topic, but I'll indulge myself for at least one more message
on this. Here is a fun link I've been saving for a while:
On Sunday, September 14, 2003, at 07:42 PM, Sebastien X wrote:
For indexing my documents I use this code :
writer = new IndexWriter(index, new StandardAnalyzer(), create);
I use 2 fields content and title but I would use différents
Analyzer for this 2 fields (one who could use numerical and the
please keep the discussions on the lucene-user e-mail list.
of course the source code will be available... what is there is already
in lucene's CVS and i will just revamp what is there and commit it.
and when we make lucene releases it will be bundled and made available
as a single download
On Wednesday, September 17, 2003, at 08:43 AM, Killeen, Tom wrote:
I would suggest XML as well.
Again, I'd like to hear more about how you'd do this generically. Tell
me what the field names and values would correspond to when presented
with an XML file.
Erik
On Wednesday, September 17, 2003, at 09:21 AM, Pitre, Russell wrote:
I know this may be far fetched, but how about being able to index
.jsp'sI know this is a spindle thing, but It seems a lot of people
need this functionality.
Like I communicated in a previous thread, indexing JSP's just
And with the latest Lucene codebase in CVS, you could also use a
DateFilter wrapped inside a CachingWrapperFilter instead of a
QueryFilter. Just wanted to mention what is now available.
But I'll reiterate what Doug says... be sure to save off the filter
instance so you don't take the
Try using IndexSearcher.explain and dump out the contents of what it
returns either as toString or toHtml (whichever format suits your
environment best) and see what it has to say. It'll give you the
low-down on the factors involved in the score calculation. I'm
interested to see what you
On Friday, September 19, 2003, at 11:15 AM, hui wrote:
1. Move the Analyzer down to field level from document level so some
fields
could be applied a specail analyzer.Other fields still use the default
analyzer from the document level.
For example, I do not need to index the number for the
I'm going to swap in the neko HTML parser for the demo refactorings I'm
doing. I would be all for replacing the demo HTML parser with this.
If you look at the Ant index task in the sandbox, you'll see that I
used JTidy for it and it works well, but I've heard that neko is faster
and better
On Friday, September 19, 2003, at 07:45 PM, Erik Hatcher wrote:
On Friday, September 19, 2003, at 11:15 AM, hui wrote:
1. Move the Analyzer down to field level from document level so some
fields
could be applied a specail analyzer.Other fields still use the default
analyzer from the document
StandardAnalyzer removes stop words and a is one of them. That is
why you have issues with that phrase.
Erik
On Sunday, September 21, 2003, at 06:13 PM, Niall Lennon wrote:
I'm currently using the MultiFieldQueryParser to search across four
fields. I'm searching for phrases so i've wrapped
On Sunday, September 21, 2003, at 05:38 AM, Senthil Kumar K wrote:
Hi,
Erik thanks for your reply.
How to do using QueryParser expressions.
My personal take is using the date support in Lucene is not all that
fun or easy because of the contortions necessary to deal with it. But,
Yes, you can do numeric searches as long as you realize its really just
text that is indexed. You will need to ensure the Analyzer you use
indexes numbers appropriately as well.
Erik
On Monday, September 22, 2003, at 02:06 AM, Senthil Kumar K wrote:
Hi,
I found that lucene is a
Ah, this is a fun one lots of fiddly issues with how queries work
and how QueryParser works. I'll take a stab at some of these inline
below
On Monday, September 22, 2003, at 08:26 PM, Dan Quaroni wrote:
I have a simple command line interface for testing.
Interesting interface. Looks
Better yet, submit some JUnit test cases that show how this stuff
works, if the ones in Lucene's codebase aren't comprehensive enough.
This is an excellent way to play with an API and get a good
understanding of it and documenting it at the same time.
Erik
On Tuesday, September 23, 2003, at
On Tuesday, September 23, 2003, at 10:09 AM, Dan Quaroni wrote:
Yeah, thanks a lot for your help! I'm using the release version of
Lucene
version 1.2.
Perhaps give the latest codebase a try too, just to see if any fixes
(particularly in that WildcardQuery.toString) are there.
you're getting
On Friday, September 26, 2003, at 07:58 AM, Robert Selvaraj wrote:
The SearchBlox FREE Edition is available free of charge and can index
up to 1000 HTML documents.
This looks very slick! I'd guess you'd not be interested in
open-sourcing what you've done, but here's an idea to consider - use
Robert,
On Friday, September 26, 2003, at 09:45 AM, Robert Selvaraj wrote:
Thanks Eric.
That's Erik, with a k :))
I definitely like the idea of including the FREE Edition's WAR file to
the Lucene code base as a demo application. On one hand, it
demonstrates the power of Lucene and on the
On Friday, September 26, 2003, at 10:16 AM, Robert Selvaraj wrote:
It will not be possible to provide you with the source code!
All the editions of SearchBlox are built on the same code base and
releasing the source code to Lucene will have a adverse impact on our
commercial versions with the
We could put this in the Lucene sandbox CVS perhaps. Could you package
it similarly to the other contributions there with a build file and
convert your command-line tests to JUnit tests that run from the build
file?
I took a quick look and looks like you did a fair bit of work and have
On Sunday, September 28, 2003, at 09:20 AM, Pierrick Brihaye wrote:
We can have no GPL code in Apache's CVS.
:-/ How can we do, so ? Shall I split the packages in two parts ? No
problems for the Lucene bindings. But there could be one for the
aramorph
package (java port of the original work),
On Sunday, September 28, 2003, at 10:08 AM, Pierrick Brihaye wrote:
It probably wouldn't be a bad idea to have
some type of repository of Lucene extensions hosted elsewhere anyway
to
solve the GPL issue.
Totally agree !
So... it is an ASL infringement from my part to have prepended
Nightly builds are here:
http://gump.covalent.net/jars/latest/jakarta-lucene/
On Thursday, October 2, 2003, at 11:15 AM, Guilherme Barile wrote:
Can someone tell me where to download lucene 1.3 please ?
On Tue, 2003-09-30 at 17:30, Guilherme Barile wrote:
I just reinstalled my system, and
The CHANGES.txt file says this about the issue:
2. Changed file locking to place lock files in
System.getProperty(java.io.tmpdir), where all users are
permitted to write files. This way folks can open and correctly
lock indexes which are read-only to them.
I haven't thought through
On Friday, October 3, 2003, at 05:33 PM, Dan Quaroni wrote:
I'm running lucene 1.2, and when I do the following query I get the
following exception:
name:of^1
Works fine with the latest CVS version, though.
What are you searching for? Are you trying to indicate a boost factor
of 1 (which is
Hits is not a java.util.Collection, so you cannot directly iterate
through it using logic:iterate. There are some Lucene-specific tag
libraries that may do what you're after within the
jakarta-lucene-sandbox CVS repository (you'll have to check out that
module and build them yourself,
Mike,
I agree that MultiQueryParser could use some enhancements to bring it
closer to QueryParser's capabilities. I'd recommend that setters be
added rather than adding another constructor - it just makes it easier
to add more tweakable parameters later instead of adding new
constructors,
On Friday, October 10, 2003, at 04:30 AM, Ulrich Mayring wrote:
when I search for MS-Word I get all the documents that contain
exactly that word, which is good. If, however, I search for MS-Word
(without the quotes), then the MultiFieldQueryParser restructures the
query to MS -Word and I
Look at Lucene's build file (the one in CVS) and how it deals with this
situation. It does this:
target name=javacc-StandardAnalyzer depends=init,javacc-check
if=javacc.present
!-- generate this in a build directory so we can exclude
ParseException --
mkdir
I agree that the current behavior is broken and will gladly patch it
myself. I'm CC'ing lucene-dev to see if there are any objections. If
there are no objections, I'll apply this patch in a couple of days.
Erik
On Wednesday, October 15, 2003, at 10:24 AM, Michael Giles wrote:
So how do we
On Monday, October 20, 2003, at 12:00 PM, Steve Jenkins wrote:
Hi,
Wonder if anyone can help. Has anyone used Lucene on a Windows
environment?
Anyone know of any documentation specifically focused on doing that?
Or anyone know of any gotchas to avoid?
Yup, used Lucene on Windows lots. Is there
On Monday, October 20, 2003, at 11:06 AM, Tom Howe wrote:
contain Section and Study information and then, if a user wants a set
of
Study documents, just aggregate them after the search by hand or is
there a more lucene way of doing this? I'm trying to avoid storing
too much redundant
On Wednesday, October 15, 2003, at 10:24 AM, Michael Giles wrote:
So how do we move this issue forward. I can't think of a single case
where a - with no whitespace on either side (i.e. t-shirt, Wal-Mart)
should be interpreted as a NOT command. Is there a feeling that
changing the
Is anyone doing anything interesting with the
Token.setPositionIncrement during analysis?
Just for fun, I've written a simple stop filter that bumps the position
increments to account for the stop words removed:
public final Token next() throws IOException {
int increment = 0;
for
On Tuesday, October 21, 2003, at 03:36 AM, Pierrick Brihaye wrote:
The basic idea is to have several tokens at the same position (i.e.
setPositionIncrement(0)) which are different possible stems for the
same word.
Right. Like I said, I recognize the benefits of using a position
increment of
On Tuesday, October 21, 2003, at 07:31 PM, Otis Gospodnetic wrote:
So phone boy would match documents containing phone the boy? That
doesn't sound right to me, as it assumes what the user is trying to do.
That is correct currently a match would be found. Here's a little
test case I'm
On Tuesday, October 28, 2003, at 08:54 AM, William W wrote:
Is there any Lucene best practice ?
Is there anything in particular you're interested in knowing about?
This list and its archives contain in conjunction with the jGuru FAQ
are the best sources for such info currently as well as
On Tuesday, October 28, 2003, at 11:03 AM, William W wrote:
I'm using Lucene more than one year in our project. Lucene gave us the
flexibility that we need ( - the like %something syntax :).
But... I would like to know if we are doing everything in the best way
(nothing in particular).
Could
1 - 100 of 800 matches
Mail list logo