Re: Indexing other documents type than html and txt (XML)

2001-11-30 Thread Erik Hatcher
I second the motion to have a place to store contributed Document generators. I've developed an HTML file handler that creates a Document using JTidy under the covers to DOM'ify it and pull out only the non-HTML tagged text into a content field and strips the title out as a separate field. It

Re: Lucene Build Instructions

2002-02-15 Thread Erik Hatcher
I would go so far as to even recommend that javacc.home be defined in build.xml with a default location that is the recommend home for it and if it needs adjusting it would happen via build.properties or -D command-line switch. Again, I'm here to help and volunteer my Ant expertise to

[PATCH] Re: Lucene Build Instructions

2002-02-15 Thread Erik Hatcher
Lucene and I've seen messages from other users with the same problem). Can you make the changes and submit the patches? Then, if everybody agrees, they can be commited. --Daniel -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED]] Sent: sábado, 16 de fevereiro de 2002

Re: HTML parser

2002-04-19 Thread Erik Hatcher
HttpUnit (which uses JTidy under the covers) makes childs play out of pulling out links and navigating to them. The only caveat (and this would be true for practically all tools, I suspect) is that the HTML has to be relatively well-formed for it to work well. JTidy can be somewhat forgiving

Re: sanity check - index size

2002-05-21 Thread Erik Hatcher
bigger) than your original data set. --Peter On 5/20/02 4:16 PM, Erik Hatcher [EMAIL PROTECTED] wrote: I'm indexing 900+ files (less than 1,000) that total about 15MB in size. These are text files and HTML files. I only index them into a few fields (title, content, filename). My index

Re: Lucene Index File Names

2002-05-22 Thread Erik Hatcher
What build system are you using? Why can't you just blindly pick up the entire index directory and incorporate that? This is what I do with a webapp build that indexes documents during the build and rolls them into the WAR file. All using Ant, of course! :) Erik - Original Message

Re: Standalone Lucene server

2002-05-31 Thread Erik Hatcher
The application we built for our book (Java Development with Ant - http://www.manning.com/antbook/) uses Lucene to build an index from an Ant build (think static documentation here) and then was incorporated in a few different environments: - command-line query tool - Ant query task

Re: Wildcard searching

2002-07-03 Thread Erik Hatcher
Like a charm. - Original Message - From: Rakesh Ayilliath [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, July 03, 2002 12:38 AM Subject: Re: Wildcard searching Dear All, How well does Lucene integrate with Struts ? regards, Rakesh.

Re: Ant index task

2002-07-10 Thread Erik Hatcher
Subject: Re: Ant index task Erik, Does this require the latest and greatest Ant 1.5*? I've got 1.4.1. BUILD FAILED /home/otis/cvs-repositories/jakarta/jakarta-lucene-sandbox/projects/ant/buil d.xml:72: The jar task doesn't support the destfile attribute. Thanks, Otis --- Erik Hatcher

Re: Ant index task

2002-07-10 Thread Erik Hatcher
Ok, its been relocated to contributions/ant in the the sandbox, and the jar task fixed to work with pre-Ant 1.5 (sorry, I'm always running a self-built latest greatest version of Ant :) - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED

IndexSearcher on JAR resources?

2002-08-28 Thread Erik Hatcher
I searched the archives, but may have missed it. I suspect someone has done this before: How can I read a Lucene index that is stored within a JAR file rather than directly on the file system? I want to integrate a read-only index into a WAR and an EJB environment which has the easiest

Re: IndexSearcher on JAR resources?

2002-09-12 Thread Erik Hatcher
Tim Dawson wrote: I need to do almost exactly the same thing as Erik - create a read-only index on our help webapp that will be packaged inside an ear file. Eventually I'll have a look at implementing this (and of course contributing it back to Lucene's codebase) - its on my to-do list. But

Re: HTML Analyzer?

2002-11-14 Thread Erik Hatcher
If you have a look at the HtmlDocument class in the ant contributions directory of jakarta-lucene-sandbox in Jakarta's CVS. http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/ant/src/main/org/apache/lucene/ant/HtmlDocument.java?annotate=1.1 I wrote this and it uses JTidy to

Re: HTML saga continues...

2002-12-12 Thread Erik Hatcher
Look in the Lucene sandbox in CVS. I contributed an Ant task that indexed HTML documents. It uses JTidy under the covers to parse HTML into title and body content, and it could be extended to pull other information such meta keywords. Erik Leo Galambos wrote: So, I have tried this with

Re: HTML saga continues...

2002-12-12 Thread Erik Hatcher
information. Let me know if you experience any issues with it, or have comments. Erik Erik Hatcher wrote: Look in the Lucene sandbox in CVS. I contributed an Ant task that indexed HTML documents. It uses JTidy under the covers to parse HTML into title and body content, and it could be extended

Querying for documents that have a field

2002-12-30 Thread Erik Hatcher
Is it possible to get a collection of documents based on whether they have a particular field (regardless of value)? I'm indexing HTML documents, and want to pull out some information that may or may not be present in the documents (and adding a field if that information exists but not

Re: How to obtain unique field values

2002-12-30 Thread Erik Hatcher
- From: Erik Hatcher [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, December 30, 2002 12:24 PM Subject: How to obtain unique field values One final Lucene question for the day... Is it possible for me to retrieve all the values of a particular field that exists within an index, across all

Re: QueryParser question

2002-12-31 Thread Erik Hatcher
a different Analyzer (WhitespaceAnalyzer? Maybe you can try that one before writing your own). Otis --- Erik Hatcher [EMAIL PROTECTED] wrote: I have a keyword field that has a value like: /path/to/something. Is there a way I can use QueryParser to get documents that have that field value? It seems

Re: QueryParser question

2002-12-31 Thread Erik Hatcher
Doug, Your points are well taken and I appreciate your time in replying to this. I'm on the same wavelength with this thinking about QueryParser, and I realize I'm attempting to push it past it designed simplicity. I'm not as knowledgeable (and who is?!) on Lucene's API and design as you

SearchBean in action?

2003-01-19 Thread Erik Hatcher
Are folks using the SearchBean code found in the contributions/searchbean area of the sandbox? I've just spent a while attempting to get an example working with it and I've ended up having to tweak the code to get it working partially. First, I encountered a compilation error because of an

Re: SearchBean in action?

2003-01-20 Thread Erik Hatcher
On Sunday, January 19, 2003, at 11:17 PM, Terry Steichen wrote: I've recently made some specific additional changes (regarding scoring, as I recall). At this point I don't recall which of these have been included in the distribution and which have not. I'm using the latest CVS HEAD version

Re: SearchBean in action?

2003-01-20 Thread Erik Hatcher
Terry, Thanks for sharing your changes. I have commit access to the sandbox, so if I get into using the HitsIterator more I'll be happy to commit these types of changes as long as no other dev folks complain. If you can find the time, please post your patches to Bugzilla for safe keeping,

Re: SearchBean in action?

2003-01-20 Thread Erik Hatcher
On Monday, January 20, 2003, at 07:33 PM, Terry Steichen wrote: Erik, You're welcome to the changes. But (a) I'm not sure the changes are general enough to appropriately be added to the code base The current CVS of HitsIterator is hard-coded to a single field to sort upon. Your code is

Re: Range queries

2003-01-22 Thread Erik Hatcher
Unfortunately I don't believe date field range queries work with QueryParser, or at least not human-readable dates. Is that correct? I think it supports date ranges if they are turned into a numeric format, but no human would type that kind of query in. I'm sure supporting true date range

Re: Range queries

2003-01-22 Thread Erik Hatcher
-form dates (provide some type of UI element to enter year, month, day seperately) or give some copy stating dates should be in MMDD format. -Mike. Erik Hatcher wrote: Unfortunately I don't believe date field range queries work with QueryParser, or at least not human-readable dates

Re: Range queries

2003-01-22 Thread Erik Hatcher
human being, I don't know.) Regards, Terry PS: Just to clarify, I believe that dates represented this way are internally treated as strings by Lucene. - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, January 22, 2003 9:49

Re: Range queries

2003-01-22 Thread Erik Hatcher
; String end = 12/31/02; And the results were the same, all expected documents were returned. It does not work to use null for begin or end to leave either side of the range open-ended. Erik On Wednesday, January 22, 2003, at 08:56 PM, Erik Hatcher wrote: I wanted to see this first-hand, so I

Re: Range queries

2003-01-22 Thread Erik Hatcher
So you are using 1.3dev1? I'm not having the same effect you are - and looking at QueryParser.jj it does this: private Query getRangeQuery(String field, Analyzer analyzer, String part1, String part2,

Re: Range queries

2003-01-22 Thread Erik Hatcher
on this thread. Back to your regular programming... On Wednesday, January 22, 2003, at 11:28 PM, Erik Hatcher wrote: Ah, maybe its how we are indexing our fields differently? How are you indexing your my_date_field? I'm using this syntax: Field.Keyword(fieldName, new Date()) Maybe you are indexing

Date fields (was: Re: Range queries)

2003-01-23 Thread Erik Hatcher
Again my apologies to Terry who was patient and confident despite my misunderstanding. I saw a change earlier today (on lucene-dev) with the QueryParser syntax updated with more details on the range query format. I think the ambiguity of saying a date field is still there though, since the

Re: no-index or index

2003-01-30 Thread Erik Hatcher
If you look at the contributions/ant area of the Lucene sandbox in CVS you'll see my HtmlDocument class which uses JTidy. Rather than making up some invalid HTML tag, I'd recommend you separate your navigation section with a div or span with a special class=navigation or something like

Re: no-index or index

2003-01-30 Thread Erik Hatcher
On Thursday, January 30, 2003, at 06:59 PM, Michael Wechner wrote: Well, I haven't found out how to use JTidy to ignore such tags that have such a class. You did it the way I envisioned. I did not expect JTidy to have a way to ignore tags either, but rather having to do it the laborious way

Re: How to index a Word document

2003-01-30 Thread Erik Hatcher
My current day job project uses SQL Server (yes, Slammer hit several folks, sheesh!) and it has built-in full-text indexing of IMAGE fields that correspond to Office documents (or anything Index Server can index, I suppose). It works very well and made our Word document indexing issue a

Re: Wildcard workaround

2003-05-29 Thread Erik Hatcher
The problem with that solution is the same as what the other thread about OutOfMemory is discussing with wildcard queries. Just prefixing something with a fixed query to 'hack' a wildcard query could lead to performance/memory issues. I recommend indexing the file extension (or mime type) as a

Re: Adding to the same Keyword field

2003-07-11 Thread Erik Hatcher
It will make multiple tokens within the same field just as if you had used a tokenized field with all the values (and your analyzer split them in the manner you're adding them). Does that make sense? In other words, adding keywords foo and bar is the same as using a text field as foo bar

Re: Luke - Lucene Index Browser

2003-07-16 Thread Erik Hatcher
Very well done On Monday, July 14, 2003, at 11:47 AM, Andrzej Bialecki wrote: Dear Lucene Users, Luke is a diagnostic tool for Lucene (http://jakarta.apache.org/lucene) indexes. It enables you to browse documents in existing indexes, perform queries, navigate through terms, optimize

Re: access denied to indexDir/segments

2003-07-18 Thread Erik Hatcher
On Friday, July 18, 2003, at 01:35 PM, Raible, Matt wrote: I'm planning on having different index directories for each user, in the format indexDir/username. This will allow a single site to be indexed, or I can loop through all the users and index them all. We're going to leave up the

Re: Multiple fields identical terms.

2003-07-30 Thread Erik Hatcher
On Wednesday, July 30, 2003, at 06:16 AM, Gregor Heinrich wrote: I would like to have unique term texts in my term enumeration. That is, across all fields there should be no duplicate term text. An easy solution would be to only use one field. But does someone know an alternative way with

Java.net Lucene article

2003-07-30 Thread Erik Hatcher
A Lucene Intro article I recently wrote for java.net has just published: http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html Erik p.s. Am I an official committer now with the repository enabled for ehatcher? If so, then I'll commit this link to the resources section of the site

Re: NLucene up to date ?

2003-07-31 Thread Erik Hatcher
I'd love to see there be quality implementations of the Lucene API in other languages, that are up to date with the latest Java codebase. I'm embarking on a Ruby port, which I'm hosting at rubyforge.org. There is a Python version called Lupy. A related question I have is what about

Re: NLucene up to date ?

2003-07-31 Thread Erik Hatcher
On Thursday, July 31, 2003, at 11:22 AM, Scott Ganyo wrote: Do these implementations maintain file compatibility with the Java version? Lupy claims to. I don't know about NLucene, but it should. Any port of Lucene to another language should (out of respect and common-sense) maintain index

Re: javacc problem + path/link problem in html demo

2003-08-01 Thread Erik Hatcher
]msgId=763762 http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene- [EMAIL PROTECTED]msgId=774059 Another Eric, Erik Hatcher is active in both Ant and Lucene (as I found out in a nofluffjuststuff.com talk) scene. So hopefully good things will come about. Hi Chris! Yeah, I'll (eventually) have

Re: AND instead OR for the search

2003-08-14 Thread Erik Hatcher
QueryParser, at least in Lucene's CVS, supports an attribute to toggle this capability. QueryParser.setOperator(QueryParser.DEFAULT_OPERATOR_AND) On Monday, August 4, 2003, at 06:19 PM, Sebastien X wrote: Hi everybody, When I make a search, for example : Jakarta lucene The search engine

Re: Luke v 0.2 - Lucene Index Browser

2003-08-14 Thread Erik Hatcher
On Monday, August 11, 2003, at 11:12 AM, Günter Kukies wrote: Hi, nice Tool. Here some points for further developments: show the contents of a Reader-valued Field But Reader-valued fields are not stored in the index (Field.Text(name, Reader), that is). The only thing it could show is the

Re: AND instead OR for the search

2003-08-14 Thread Erik Hatcher
You should re-index using 1.3 when you upgrade the library to 1.3. I don't know the differences specifically (without doing some CVS diffs) but there were some changes that affect the index format. Erik On Tuesday, August 5, 2003, at 11:48 AM, hui wrote: Thank you, Sebastien. It works now.

Re: Checkpointable Index

2003-08-14 Thread Erik Hatcher
As long as you finish any index operations when you checkpoint it, there should be any problems. A Lucene index is just regular ol' filesystem files just like everything else, so there is no problem storing it off and using it later. Erik On Thursday, August 7, 2003, at 12:56 PM, Rob Outar

Re: Hits scoring question

2003-08-18 Thread Erik Hatcher
Its open-source... have a look yourself! :)) On Saturday, August 16, 2003, at 01:41 AM, Rociel Buico wrote: hello, I would like to ask of what algorithm or statistics that Hits.score() method is using? Is it bayesian? tia, buics - Do you Yahoo!? The New

Re: Similar Document Search

2003-08-18 Thread Erik Hatcher
Using the QueryFilter would help with the refining a search based on hits from a previous search, but it wouldn't help with the like part your asked about. I'm interested in what you turn up with this though. Erik On Monday, August 18, 2003, at 01:11 PM, Terry Steichen wrote: Is it

Re: Newbie Questions

2003-08-26 Thread Erik Hatcher
On Tuesday, August 26, 2003, at 12:53 AM, Mark Woon wrote: 1) How can I search all fields at the same time? The QueryParser seems to only search one specific field. The common thing I've done and seen others do is glue all the fields together into a master searchable field named something like

Re: IndexReader.delete(Term)?

2003-08-27 Thread Erik Hatcher
What you are doing looks fine to me. I'm sure these are obvious questions, kinda like is your computer plugged in?, but here goes: - How are you determining that the document is still there? With an IndexReader? IndexSearcher? - A freshly created (i.e. after the delete)

Re: One direction phrase searches

2003-09-02 Thread Erik Hatcher
On Tuesday, September 2, 2003, at 04:11 PM, Joe Paulsen wrote: It seems when I do a search such as covered wagon ~5 or the like, the systems disregards the order of my terms. I.E., it will find covered within 5 of wagon and it will also find wagon within 5 of covered. I wanted to see this in

Re: One direction phrase searches

2003-09-02 Thread Erik Hatcher
Because I'm really interested in the guts of Lucene, I dug even deeper On Tuesday, September 2, 2003, at 07:39 PM, Erik Hatcher wrote: Is there anyway to make the system respond only to the order of the terms as entered in the query? I'm sure there is a way to make an OrderedPhraseQuery

Re: Lucene app to index Java code

2003-09-04 Thread Erik Hatcher
A couple of thoughts on this: - Eclipse uses Lucene for its code indexing/searching (I learned this at the OSCON Keynote by Eclipse folks). Perhaps looking at how Eclipse does its thing would be useful even if not the solution. - XDoclet could be used to sweep through Java code and build a

Re: Lucene app to index Java code

2003-09-04 Thread Erik Hatcher
On Thursday, September 4, 2003, at 09:19 AM, petite_abeille wrote: - XDoclet could be used to sweep through Java code and build a text/XML file as richly as you'd like from the information there (complete with JavaDoc tags, which Zapata will miss :)), Correct. This happen to be on purpose :)

Re: Fw: Refine the result set

2003-09-05 Thread Erik Hatcher
I replied to your previous message, but maybe it got lost in my e-mail address shuffling Have a look at QueryFilter. It will allow you to do search refining as you desire. Erik On Friday, September 5, 2003, at 06:23 AM, Gabriel Boian wrote: Because I haven't fount any solution I must

Re: Lucene features

2003-09-05 Thread Erik Hatcher
On Friday, September 5, 2003, at 02:36 PM, Chris Sibert wrote: Synonym searching might be desirable, but now that I'm thinking about it, also likely not important. This could be done with a custom Analyzer. Associated Words - sounds very interesting, like 'gold' might return 'metal' also, etc.

Re: Lucene features

2003-09-06 Thread Erik Hatcher
On Friday, September 5, 2003, at 07:45 PM, Leo Galambos wrote: And for the second time today QueryFilter. It allows narrowing the documents queried to only the documents from a previous Query. I guess, it would not be an ideal solution - the first query does two things a) it selects a

Re: Can't search using numbers

2003-09-08 Thread Erik Hatcher
Which Analyzer are you using? Look at the Analysis section of my article here: http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html It will help you construct a simple test to see what your Analyzer is or is not doing. Erik On Monday, September 8, 2003, at 12:55 PM, Onglatco,

Re: Lucene and Struts

2003-09-12 Thread Erik Hatcher
I've been lurking on these related threads to see what others came up with. The real issue here is defining what the content really is. If the web application is solely your own creation, then I feel crawling or weird magical tricks with Struts/Tiles/taglibs is not the proper approach.

Re: Lucene and Struts

2003-09-12 Thread Erik Hatcher
On Friday, September 12, 2003, at 02:52 PM, Robert Taylor wrote: I agree that this makes indexing rather straight forward, but then I have to build/use a content management system for my existing web application(s). That's not going to fly for me right now. Maybe in the future - but for now,

Re: Why Hits is not serializable?

2003-09-12 Thread Erik Hatcher
On Friday, September 12, 2003, at 09:05 PM, Marco Tedone wrote: Hi, I was wondering why Hits is not serializable? Hits is really a collection of pointers back to the documents, not a standalone collection. Which is your standard way to pass the matching documents between different context?

Re: How I created a Struts search facility

2003-09-13 Thread Erik Hatcher
Well done! On Friday, September 12, 2003, at 10:16 PM, Marco Tedone wrote: Hi , I would like to share with you the solution I chose at the end to create a search facility for my Struts application. Basically, I followed the Erik's suggestion to realize something independent from the actual

Re: Lucene and Struts

2003-09-13 Thread Erik Hatcher
On Saturday, September 13, 2003, at 11:08 AM, Eric Jain wrote: and I'm very particular about URL's and how it maps to content I, too, am very particular about that. A bit off topic, but I'll indulge myself for at least one more message on this. Here is a fun link I've been saving for a while:

Re: différents Analyzer for 2 fields

2003-09-14 Thread Erik Hatcher
On Sunday, September 14, 2003, at 07:42 PM, Sebastien X wrote: For indexing my documents I use this code : writer = new IndexWriter(index, new StandardAnalyzer(), create); I use 2 fields content and title but I would use différents Analyzer for this 2 fields (one who could use numerical and the

Re: Lucene demo ideas?

2003-09-17 Thread Erik Hatcher
please keep the discussions on the lucene-user e-mail list. of course the source code will be available... what is there is already in lucene's CVS and i will just revamp what is there and commit it. and when we make lucene releases it will be bundled and made available as a single download

Re: Lucene demo ideas?

2003-09-17 Thread Erik Hatcher
On Wednesday, September 17, 2003, at 08:43 AM, Killeen, Tom wrote: I would suggest XML as well. Again, I'd like to hear more about how you'd do this generically. Tell me what the field names and values would correspond to when presented with an XML file. Erik

Re: Lucene demo ideas?

2003-09-17 Thread Erik Hatcher
On Wednesday, September 17, 2003, at 09:21 AM, Pitre, Russell wrote: I know this may be far fetched, but how about being able to index .jsp'sI know this is a spindle thing, but It seems a lot of people need this functionality. Like I communicated in a previous thread, indexing JSP's just

Re: slow performance with Date Range Searching

2003-09-17 Thread Erik Hatcher
And with the latest Lucene codebase in CVS, you could also use a DateFilter wrapped inside a CachingWrapperFilter instead of a QueryFilter. Just wanted to mention what is now available. But I'll reiterate what Doug says... be sure to save off the filter instance so you don't take the

Re: Lucene Scoring Behavior

2003-09-17 Thread Erik Hatcher
Try using IndexSearcher.explain and dump out the contents of what it returns either as toString or toHtml (whichever format suits your environment best) and see what it has to say. It'll give you the low-down on the factors involved in the score calculation. I'm interested to see what you

Re: some requests

2003-09-19 Thread Erik Hatcher
On Friday, September 19, 2003, at 11:15 AM, hui wrote: 1. Move the Analyzer down to field level from document level so some fields could be applied a specail analyzer.Other fields still use the default analyzer from the document level. For example, I do not need to index the number for the

Re: HTML Parsing problems...

2003-09-19 Thread Erik Hatcher
I'm going to swap in the neko HTML parser for the demo refactorings I'm doing. I would be all for replacing the demo HTML parser with this. If you look at the Ant index task in the sandbox, you'll see that I used JTidy for it and it works well, but I've heard that neko is faster and better

per-field Analyzer (was Re: some requests)

2003-09-20 Thread Erik Hatcher
On Friday, September 19, 2003, at 07:45 PM, Erik Hatcher wrote: On Friday, September 19, 2003, at 11:15 AM, hui wrote: 1. Move the Analyzer down to field level from document level so some fields could be applied a specail analyzer.Other fields still use the default analyzer from the document

Re: MultiFieldQueryParser Phrases Problem

2003-09-21 Thread Erik Hatcher
StandardAnalyzer removes stop words and a is one of them. That is why you have issues with that phrase. Erik On Sunday, September 21, 2003, at 06:13 PM, Niall Lennon wrote: I'm currently using the MultiFieldQueryParser to search across four fields. I'm searching for phrases so i've wrapped

Re: How to search document for modified date

2003-09-21 Thread Erik Hatcher
On Sunday, September 21, 2003, at 05:38 AM, Senthil Kumar K wrote: Hi, Erik thanks for your reply. How to do using QueryParser expressions. My personal take is using the date support in Lucene is not all that fun or easy because of the contortions necessary to deal with it. But,

Re: Is it possible in lucene for numeric search

2003-09-22 Thread Erik Hatcher
Yes, you can do numeric searches as long as you realize its really just text that is indexed. You will need to ensure the Analyzer you use indexes numbers appropriately as well. Erik On Monday, September 22, 2003, at 02:06 AM, Senthil Kumar K wrote: Hi, I found that lucene is a

Re: Confusion over wildcard search logic

2003-09-23 Thread Erik Hatcher
Ah, this is a fun one lots of fiddly issues with how queries work and how QueryParser works. I'll take a stab at some of these inline below On Monday, September 22, 2003, at 08:26 PM, Dan Quaroni wrote: I have a simple command line interface for testing. Interesting interface. Looks

Re: Confusion over wildcard search logic

2003-09-23 Thread Erik Hatcher
Better yet, submit some JUnit test cases that show how this stuff works, if the ones in Lucene's codebase aren't comprehensive enough. This is an excellent way to play with an API and get a good understanding of it and documenting it at the same time. Erik On Tuesday, September 23, 2003, at

Re: Confusion over wildcard search logic

2003-09-23 Thread Erik Hatcher
On Tuesday, September 23, 2003, at 10:09 AM, Dan Quaroni wrote: Yeah, thanks a lot for your help! I'm using the release version of Lucene version 1.2. Perhaps give the latest codebase a try too, just to see if any fixes (particularly in that WildcardQuery.toString) are there. you're getting

Re: Announcing SearchBlox Search Application 1.0

2003-09-26 Thread Erik Hatcher
On Friday, September 26, 2003, at 07:58 AM, Robert Selvaraj wrote: The SearchBlox FREE Edition is available free of charge and can index up to 1000 HTML documents. This looks very slick! I'd guess you'd not be interested in open-sourcing what you've done, but here's an idea to consider - use

Re: Announcing SearchBlox Search Application 1.0

2003-09-26 Thread Erik Hatcher
Robert, On Friday, September 26, 2003, at 09:45 AM, Robert Selvaraj wrote: Thanks Eric. That's Erik, with a k :)) I definitely like the idea of including the FREE Edition's WAR file to the Lucene code base as a demo application. On one hand, it demonstrates the power of Lucene and on the

Re: Announcing SearchBlox Search Application 1.0

2003-09-26 Thread Erik Hatcher
On Friday, September 26, 2003, at 10:16 AM, Robert Selvaraj wrote: It will not be possible to provide you with the source code! All the editions of SearchBlox are built on the same code base and releasing the source code to Lucene will have a adverse impact on our commercial versions with the

Re: Announce : arabic Stemmer/Analyzer for Lucene

2003-09-28 Thread Erik Hatcher
We could put this in the Lucene sandbox CVS perhaps. Could you package it similarly to the other contributions there with a build file and convert your command-line tests to JUnit tests that run from the build file? I took a quick look and looks like you did a fair bit of work and have

Re: Announce : arabic Stemmer/Analyzer for Lucene

2003-09-28 Thread Erik Hatcher
On Sunday, September 28, 2003, at 09:20 AM, Pierrick Brihaye wrote: We can have no GPL code in Apache's CVS. :-/ How can we do, so ? Shall I split the packages in two parts ? No problems for the Lucene bindings. But there could be one for the aramorph package (java port of the original work),

Re: Announce : arabic Stemmer/Analyzer for Lucene

2003-09-28 Thread Erik Hatcher
On Sunday, September 28, 2003, at 10:08 AM, Pierrick Brihaye wrote: It probably wouldn't be a bad idea to have some type of repository of Lucene extensions hosted elsewhere anyway to solve the GPL issue. Totally agree ! So... it is an ASL infringement from my part to have prepended

Re: Lucene 1.3-rc1 ?

2003-10-02 Thread Erik Hatcher
Nightly builds are here: http://gump.covalent.net/jars/latest/jakarta-lucene/ On Thursday, October 2, 2003, at 11:15 AM, Guilherme Barile wrote: Can someone tell me where to download lucene 1.3 please ? On Tue, 2003-09-30 at 17:30, Guilherme Barile wrote: I just reinstalled my system, and

Re: which lock belong to which index?

2003-10-02 Thread Erik Hatcher
The CHANGES.txt file says this about the issue: 2. Changed file locking to place lock files in System.getProperty(java.io.tmpdir), where all users are permitted to write files. This way folks can open and correctly lock indexes which are read-only to them. I haven't thought through

Re: of^1 illegal?

2003-10-03 Thread Erik Hatcher
On Friday, October 3, 2003, at 05:33 PM, Dan Quaroni wrote: I'm running lucene 1.2, and when I do the following query I get the following exception: name:of^1 Works fine with the latest CVS version, though. What are you searching for? Are you trying to indicate a boost factor of 1 (which is

Re: Struts logic iterate

2003-10-06 Thread Erik Hatcher
Hits is not a java.util.Collection, so you cannot directly iterate through it using logic:iterate. There are some Lucene-specific tag libraries that may do what you're after within the jakarta-lucene-sandbox CVS repository (you'll have to check out that module and build them yourself,

Re: Default AND for multi-field queries...

2003-10-07 Thread Erik Hatcher
Mike, I agree that MultiQueryParser could use some enhancements to bring it closer to QueryParser's capabilities. I'd recommend that setters be added rather than adding another constructor - it just makes it easier to add more tweakable parameters later instead of adding new constructors,

Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-10 Thread Erik Hatcher
On Friday, October 10, 2003, at 04:30 AM, Ulrich Mayring wrote: when I search for MS-Word I get all the documents that contain exactly that word, which is good. If, however, I search for MS-Word (without the quotes), then the MultiFieldQueryParser restructures the query to MS -Word and I

Re: StandardTokenizer Problem

2003-10-10 Thread Erik Hatcher
Look at Lucene's build file (the one in CVS) and how it deals with this situation. It does this: target name=javacc-StandardAnalyzer depends=init,javacc-check if=javacc.present !-- generate this in a build directory so we can exclude ParseException -- mkdir

Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-15 Thread Erik Hatcher
I agree that the current behavior is broken and will gladly patch it myself. I'm CC'ing lucene-dev to see if there are any objections. If there are no objections, I'll apply this patch in a couple of days. Erik On Wednesday, October 15, 2003, at 10:24 AM, Michael Giles wrote: So how do we

Re: Lucene on Windows

2003-10-20 Thread Erik Hatcher
On Monday, October 20, 2003, at 12:00 PM, Steve Jenkins wrote: Hi, Wonder if anyone can help. Has anyone used Lucene on a Windows environment? Anyone know of any documentation specifically focused on doing that? Or anyone know of any gotchas to avoid? Yup, used Lucene on Windows lots. Is there

Re: Hierarchical document

2003-10-20 Thread Erik Hatcher
On Monday, October 20, 2003, at 11:06 AM, Tom Howe wrote: contain Section and Study information and then, if a user wants a set of Study documents, just aggregate them after the search by hand or is there a more lucene way of doing this? I'm trying to avoid storing too much redundant

Re: Dash Confusion in QueryParser - Bug? Feature?

2003-10-20 Thread Erik Hatcher
On Wednesday, October 15, 2003, at 10:24 AM, Michael Giles wrote: So how do we move this issue forward. I can't think of a single case where a - with no whitespace on either side (i.e. t-shirt, Wal-Mart) should be interpreted as a NOT command. Is there a feeling that changing the

positional token info

2003-10-20 Thread Erik Hatcher
Is anyone doing anything interesting with the Token.setPositionIncrement during analysis? Just for fun, I've written a simple stop filter that bumps the position increments to account for the stop words removed: public final Token next() throws IOException { int increment = 0; for

Re: positional token info

2003-10-21 Thread Erik Hatcher
On Tuesday, October 21, 2003, at 03:36 AM, Pierrick Brihaye wrote: The basic idea is to have several tokens at the same position (i.e. setPositionIncrement(0)) which are different possible stems for the same word. Right. Like I said, I recognize the benefits of using a position increment of

Re: positional token info

2003-10-22 Thread Erik Hatcher
On Tuesday, October 21, 2003, at 07:31 PM, Otis Gospodnetic wrote: So phone boy would match documents containing phone the boy? That doesn't sound right to me, as it assumes what the user is trying to do. That is correct currently a match would be found. Here's a little test case I'm

Re: Best practice

2003-10-28 Thread Erik Hatcher
On Tuesday, October 28, 2003, at 08:54 AM, William W wrote: Is there any Lucene best practice ? Is there anything in particular you're interested in knowing about? This list and its archives contain in conjunction with the jGuru FAQ are the best sources for such info currently as well as

Re: Best practice

2003-10-28 Thread Erik Hatcher
On Tuesday, October 28, 2003, at 11:03 AM, William W wrote: I'm using Lucene more than one year in our project. Lucene gave us the flexibility that we need ( - the like %something syntax :). But... I would like to know if we are doing everything in the best way (nothing in particular). Could

  1   2   3   4   5   6   7   8   9   >