Re: Automatic prefix search in query parser

2021-09-03 Thread Erik Hatcher
A comparable alternative would be to use the edge ngram filter to index 
prefixes instead.  

Erik


> On Sep 3, 2021, at 10:49 AM, Gauthier Roebroeck 
>  wrote:
> 
> Hello,
> 
> I am using Apache Lucene 8.9.0 to parse queries that are entered by humans.
> I am using the
> `org.apache.lucene.queryparser.classic.MultiFieldQueryParser` which works
> very well so far.
> 
> However I would like to automatically use the prefix notation (`*`) for all
> terms in the query, instead of searching for exact terms, so the humans
> entering the queries don't have to type the `*` after each term.
> 
> An example query could be: `author:(murphy OR remender) AND batman AND
> release_date:1999`, which should be transformed to `author:(murphy* OR
> remender*) AND batman* AND release_date:1999*`.
> 
> Is there any way to do this with the Lucene query parsers ? I checked the
> classic and standard query parser, but couldn't find any option to do so.
> 
> If there was a way to decide which fields could automatically be prefixed,
> that would be even better.
> 
> Thanks a lot


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [VOTE] Lucene logo contest, here we go again

2020-09-01 Thread Erik Hatcher
D (binding)

> On Aug 31, 2020, at 8:26 PM, Ryan Ernst  wrote:
> 
> Dear Lucene and Solr developers!
> 
> In February a contest was started to design a new logo for Lucene 
> [jira-issue]. The initial attempt [first-vote] to call a vote resulted in 
> some confusion on the rules, as well the request for one additional 
> submission. I would like to call a new vote, now with more explicit 
> instructions on how to vote.
> 
> Please read the following rules carefully before submitting your vote.
> 
> Who can vote?
> 
> Anyone is welcome to cast a vote in support of their favorite submission(s). 
> Note that only PMC member's votes are binding. If you are a PMC member, 
> please indicate with your vote that the vote is binding, to ease collection 
> of votes. In tallying the votes, I will attempt to verify only those marked 
> as binding.
> 
> How do I vote?
> 
> Votes can be cast simply by replying to this email. It is a ranked-choice 
> vote [rank-choice-voting]. Multiple selections may be made, where the order 
> of preference must be specified. If an entry gets more than half the votes, 
> it is the winner. Otherwise, the entry with the lowest number of votes is 
> removed, and the votes are retallied, taking into account the next preferred 
> entry for those whose first entry was removed. This process repeats until 
> there is a winner.
> 
> The entries are broken up by variants, since some entries have multiple color 
> or style variations. The entry identifiers are first a capital letter, 
> followed by a variation id (described with each entry below), if applicable. 
> As an example, if you prefer variant 1 of entry A, followed by variant 2 of 
> entry A, variant 3 of entry C, entry D, and lastly variant 4e of entry B, the 
> following should be in your reply:
> 
> (binding)
> vote: A1, A2, C3, D, B4e
> 
> Entries
> 
> The entries are as follows:
> 
> A. Submitted by Dustin Haver. This entry has two variants, A1 and A2.
> 
> [A1] 
> https://issues.apache.org/jira/secure/attachment/12999548/Screen%20Shot%202020-04-10%20at%208.29.32%20AM.png
>  
> 
> [A2] https://issues.apache.org/jira/secure/attachment/12997172/LuceneLogo.png 
> 
> 
> B. Submitted by Stamatis Zampetakis. This has several variants. Within the 
> linked entry there are 7 patterns and 7 color palettes. Any vote for B should 
> contain the pattern number followed by the lowercase letter of the color 
> palette. For example, B3e or B1a.
> 
> [B] https://issues.apache.org/jira/secure/attachment/12997768/zabetak-1-7.pdf 
> 
> 
> C. Submitted by Baris Kazar. This entry has 8 variants.
> 
> [C1] 
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo1_full.pdf
>  
> 
> [C2] 
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo2_full.pdf
>  
> 
> [C3] 
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo3_full.pdf
>  
> 
> [C4] 
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo4_full.pdf
>  
> 
> [C5] 
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo5_full.pdf
>  
> 
> [C6] 
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo6_full.pdf
>  
> 
> [C7] 
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo7_full.pdf
>  
> 
> [C8] 
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo8_full.pdf
>  
> 
> 
> D. The current Lucene logo.
> 
> [D] https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png 
> 
> 
> Please vote for one of the above choices. This vote will close one week from 
> today, Mon, Sept 7, 2020 at 11:59PM.
> 
> Thanks!
> 
> [jira-issue] https://issues.apache.org/jira/browse/LUCENE-9221 
> 
> [first-vote] 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/%3cCA+DiXd74Mz4H6o9SmUNLUuHQc6Q1-9mzUR7xfxR03ntGwo=d...@mail.gmail.com%3e
>  
> 

Re: [VOTE] Lucene logo contest

2020-06-16 Thread Erik Hatcher
C - current logo

> On Jun 15, 2020, at 6:08 PM, Ryan Ernst  wrote:
> 
> Dear Lucene and Solr developers!
> 
> In February a contest was started to design a new logo for Lucene [1]. That 
> contest concluded, and I am now (admittedly a little late!) calling a vote.
> 
> The entries are labeled as follows:
> 
> A. Submitted by Dustin Haver [2]
> 
> B. Submitted by Stamatis Zampetakis [3] Note that this has several variants. 
> Within the linked entry there are 7 patterns and 7 color palettes. Any vote 
> for B should contain the pattern number, like B1 or B3. If a B variant wins, 
> we will have a followup vote on the color palette.
> 
> C. The current Lucene logo [4]
> 
> Please vote for one of the three (or nine depending on your perspective!) 
> above choices. Note that anyone in the Lucene+Solr community is invited to 
> express their opinion, though only Lucene+Solr PMC cast binding votes 
> (indicate non-binding votes in your reply, please). This vote will close one 
> week from today, Mon, June 22, 2020.
> 
> Thanks!
> 
> [1] https://issues.apache.org/jira/browse/LUCENE-9221 
> 
> [2] 
> https://issues.apache.org/jira/secure/attachment/12999548/Screen%20Shot%202020-04-10%20at%208.29.32%20AM.png
>  
> 
> [3] https://issues.apache.org/jira/secure/attachment/12997768/zabetak-1-7.pdf 
> 
> [4] https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png 
> 



Re: Payload TFIDF Similarity in Lucene 7.1.0

2018-03-13 Thread Erik Hatcher
Payloads are only scored from certain query types.   What query are you 
executing?

> On Mar 13, 2018, at 04:58, Grdan Eenc  wrote:
> 
> Hej there,
> 
> I want to extend the TFIDF Similarity class such that the term frequency is
> neglected and the value in the payload used instead. Therefore I basically
> do this:
> 
>@Override
>public float tf(float freq) {
>return 1f;
>}
> 
>public float scorePayload(int doc, int start, int end, BytesRef
> payload) {
>if (payload != null) {
>return PayloadHelper.decodeFloat(payload.bytes, payload.offset);
>} else {
>return 1f;
>}
>}
> 
> Complete class can be found here:
> 
> https://gist.github.com/nadre/66be2a2a32214f2c5ec1ec1f6edcef08
> 
> Unfortunately the scorePayload never gets called and I end up with the
> wrong scoring. I know that scorePayload is deprecated in Lucene 7.2.1 but
> it should work in 7.1.0 or am I missing something?
> 
> I implemented the same thing by directly extending the basic Similarity
> class and iterating through doc terms using the LeafReaderContext, based on
> the code in this repo:
> 
> https://github.com/sdauletau/elasticsearch-position-similarity
> 
> This works but is horribly slow which is why I would prefer the first idea.
> 
> Any idea why scorePayload doesn't get called? I really couldn't find any
> resources on the net.
> 
> Best, Erdan.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Using POS payloads for chunking

2017-06-14 Thread Erik Hatcher
Markus - how are you encoding payloads as bitsets and use them for scoring?   
Curious to see how folks are leveraging them.

Erik

> On Jun 14, 2017, at 4:45 PM, Markus Jelsma  wrote:
> 
> Hello,
> 
> We use POS-tagging too, and encode them as payload bitsets for scoring, which 
> is, as far as is know, the only possibility with payloads.
> 
> So, instead of encoding them as payloads, why not index your treebanks 
> POS-tags as tokens on the same position, like synonyms. If you do that, you 
> can use spans and phrase queries to find chunks of multiple POS-tags.
> 
> This would be the first approach i can think of. Treating them as regular 
> tokens enables you to use regular search for them.
> 
> Regards,
> Markus
> 
> 
> 
> -Original message-
>> From:José Tomás Atria 
>> Sent: Wednesday 14th June 2017 22:29
>> To: java-user@lucene.apache.org
>> Subject: Using POS payloads for chunking
>> 
>> Hello!
>> 
>> I'm not particularly familiar with lucene's search api (as I've been using
>> the library mostly as a dumb index rather than a search engine), but I am
>> almost certain that, using its payload capabilities, it would be trivial to
>> implement a regular chunker to look for patterns in sequences of payloads.
>> 
>> (trying not to be too pedantic, a regular chunker looks for 'chunks' based
>> on part-of-speech tags, e.g. noun phrases can be searched for with patterns
>> like "(DT)?(JJ)*(NN|NP)+", that is, an optional determinant and zero or
>> more adjectives preceding a bunch of nouns, etc)
>> 
>> Assuming my index has POS tags encoded as payloads for each position, how
>> would one search for such patterns, irrespective of terms? I started
>> studying the spans search API, as this seemed like the natural place to
>> start, but I quickly got lost.
>> 
>> Any tips would be extremely appreciated. (or references to this kind of
>> thing, I'm sure someone must have tried something similar before...)
>> 
>> thanks!
>> ~jta
>> -- 
>> 
>> sent from a phone. please excuse terseness and tpyos.
>> 
>> enviado desde un teléfono. por favor disculpe la parquedad y los erroers.
>> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Odd Boolean Query behavior in SOLR 3.6

2017-06-13 Thread Erik Hatcher
Yes, fq’s make up constraints in conjunction with q.  The issue here though is 
_clauses_.A single negative clause matches nothing.  There is syntactic 
sugar at the Solr level to allow for q and fq’s to have a top-level single 
negative clause, like q=-type:pdf to return all non-pdf docs.  That’s a 
shortcut convenience to saying q=*:* -type:pdf.  Once inside nested clauses, 
have to be explicit.   Gotta match something to exclude stuff.

Erik


> On Jun 13, 2017, at 10:09 AM, abhi Abhishek <abhi26...@gmail.com> wrote:
> 
> Thanks Erik, This helped and the query is running and gives results as
> expected.
> 
> Thanks for the insight, my understanding here was that fq parameter works
> on the result set of q parameter which is *:* here. shouldn't that be the
> case here?
> 
> Thanks,
> Abhishek
> 
> 
> 
> On Tue, Jun 13, 2017 at 6:02 PM, Erik Hatcher <erik.hatc...@gmail.com>
> wrote:
> 
>> Inner purely negative queries match nothing.  A query is about matching,
>> and skipping over things that don’t match.  The fix is when using
>> (-something) to do (*:* -something) to match everything and skip the
>> negative clause items.
>> 
>> In your example, try fq=((*:* -documentTypeId:3) AND companyId:29096)
>> 
>>Erik
>> 
>>> On Jun 13, 2017, at 3:15 AM, abhi Abhishek <abhi26...@gmail.com> wrote:
>>> 
>>> Hi Everyone,
>>> 
>>>   I have hit a weird behavior of Boolean Query, when I am
>>> running the query with below param’s  it’s not behaving as expected. can
>>> you please help me understand the behavior here?
>>> 
>>> 
>>> 
>>> q=*:*=((-documentTypeId:3)+AND+companyId:29096)=
>> 2.2=0=10=on=true
>>> 
>>> èReturns 0 matches
>>> 
>>> filter_queries: ((-documentTypeId:3) AND companyId:29096)
>>> 
>>> parsed_filter_queries: +(-documentTypeId:3) +companyId:29096
>>> 
>>> 
>>> 
>>> q=*:*=(-documentTypeId:3+AND+companyId:29096)=
>> 2.2=0=10=on=true
>>> 
>>> è returns 1600 matches
>>> 
>>> filter_queries:(-documentTypeId:3 AND companyId:29096)
>>> 
>>> parsed_filter_queries:-documentTypeId:3 +companyId:29096
>>> 
>>> 
>>> 
>>> Can you please help me understand what am I missing here?
>>> 
>>> 
>>> Thanks in Advance.
>>> 
>>> 
>>> Thanks & Best Regards,
>>> 
>>> Abhishek
>> 
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Odd Boolean Query behavior in SOLR 3.6

2017-06-13 Thread Erik Hatcher
Inner purely negative queries match nothing.  A query is about matching, and 
skipping over things that don’t match.  The fix is when using (-something) to 
do (*:* -something) to match everything and skip the negative clause items.

In your example, try fq=((*:* -documentTypeId:3) AND companyId:29096)

Erik

> On Jun 13, 2017, at 3:15 AM, abhi Abhishek  wrote:
> 
> Hi Everyone,
> 
>I have hit a weird behavior of Boolean Query, when I am
> running the query with below param’s  it’s not behaving as expected. can
> you please help me understand the behavior here?
> 
> 
> 
> q=*:*=((-documentTypeId:3)+AND+companyId:29096)=2.2=0=10=on=true
> 
> èReturns 0 matches
> 
> filter_queries: ((-documentTypeId:3) AND companyId:29096)
> 
> parsed_filter_queries: +(-documentTypeId:3) +companyId:29096
> 
> 
> 
> q=*:*=(-documentTypeId:3+AND+companyId:29096)=2.2=0=10=on=true
> 
> è returns 1600 matches
> 
> filter_queries:(-documentTypeId:3 AND companyId:29096)
> 
> parsed_filter_queries:-documentTypeId:3 +companyId:29096
> 
> 
> 
> Can you please help me understand what am I missing here?
> 
> 
> Thanks in Advance.
> 
> 
> Thanks & Best Regards,
> 
> Abhishek


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: question

2017-01-16 Thread Erik Hatcher
Or a no-slop PhraseQuery, where order also matters. 

   Erik

> On Jan 16, 2017, at 12:27, Markus Jelsma  wrote:
> 
> Yes, they should be the same unless the field is indexed with shingles, in 
> that case order matters.
> Markus 
> 
> -Original message-
>> From:Julius Kravjar 
>> Sent: Monday 16th January 2017 18:20
>> To: java-user@lucene.apache.org
>> Subject: question
>> 
>> May I have one question? One company - we used their sw - talked to us that
>> in Lucene it is normal that the search results for
>> 
>> 1.
>> "sas institute"
>> "institute sas"
>> are the same.
>> 
>> 2.
>> sas institute
>> institute sas
>> are the same
>> 
>> 3.
>> the number of searches of "sas institute" is smaller then sas institute
>> (analogically "institute sas" is smaller then institute sas
>> 
>> 
>> 
>> Should we believe them? Manythanks in advance.
>> 
>> Best regards
>> 
>> J. Kravjar
>> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Combination of BooleanQuery and PhraseQuery

2016-08-15 Thread Erik Hatcher
Try combining into multiple clauses… (with q.op=OR)

   “some phrase”~  OR (some phrase) 

That would boost docs with with proximity, but still allow matches for docs 
that don’t contain all terms.

Erik



> On Aug 15, 2016, at 4:02 AM, Erel Uziel  wrote:
> 
> Hi,
> Is there any query similar to BooleanQuery with SHOULD semantics that prefer
> documents where the terms are close to each other?
> I currently use a PhraseQuery with large slop for this. However this only
> works if all the terms are in the document.
> 
> Best regards, 
> Erel Uziel
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: multi valued facets

2015-06-04 Thread Erik Hatcher
Set the field to multiValued=true in your schema.  How'd you manage to get 
multiple values in there without an indexing error?   An existing index built 
with Lucene directly?

Erik 

 On Jun 4, 2015, at 17:27, Fielder, Todd Patrick tpfi...@sandia.gov wrote:
 
 I am trying to add a facet for which each document can have multiple values, 
 but am receiving the following exception:
 dimension Role Name is not multiValued, but it appears more than once in 
 this document
 
 How do I create a MultiValued Facet?
 
 Thanks in advance

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Custom Relevancy Using Field Payloads

2013-11-29 Thread Erik Hatcher
I think what you want is a PayloadTermQuery in the mix.  There's some initial 
stuff here: https://issues.apache.org/jira/browse/SOLR-1485

Erik

On Nov 27, 2013, at 12:55 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;
 
 I've asked same question at Solr mail list but could not get any answer. I
 have a payload field at my schema (Solr 4.5.1) When a user searches for a
 keyword I will calculate the usual score and if a match occurs at that
 payload field I will add payload to the general score (payload * normalize
 coefficient)
 
 How can I do that? Custom payload similarity class or custom function
 query?
 
 I've followed here:
 http://sujitpal.blogspot.com/2013/07/porting-payloads-to-solr4.html#! but
 decodeNormValue if a final method anymore. How about that:
 http://www.solrtutorial.com/custom-solr-functionquery.html
 
 Any ideas about my aim?
 
 Thanks;
 Furkan KAMACI


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: classic.QueryParser - bug or new behavior?

2013-05-19 Thread Erik Hatcher
Just a thought - this looks like it could be due to the regexp (/pattern/ 
syntax) support added, but that was added in Lucene 4.0 so it doesn't quite fit 
that it would be a difference between 4.1 and 4.2.1.  

Erik

On May 19, 2013, at 14:50 , Scott Smith wrote:

 I just upgraded from lucene 4.1 to 4.2.1.  We believe we are seeing some 
 different behavior.
 
 I'm using org.apache.lucene.queryparser.classic.QueryParser.  If I pass the 
 string 20110920/EXPIRED (w/o quotes) to the parser, I get:
 
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse 
 '20110920/EXPIRED': Lexical error at line 1, column 17.  Encountered: EOF 
 after : /EXPIRED
   at 
 org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:131)
 
 We believe this used to work.
 
 I tried googling for this and found something that said I should use 
 QueryParser.escape() on the string before passing it to the parser.  However, 
 that seems to break phrase queries (e.g., John Smith - with the quotes; I 
 assume it's escaping the double-quotes and doesn't realize it's a phrase).
 
 Since it is a forward slash, I'm confused why it would need escaping of any 
 of the characters in the string with the /EXPIRED.
 
 Has anyone seen this?
 
 Scott


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



[WEBINAR] - Lucene/Solr 4 – A Revolution in Enterprise Search Technology

2013-03-26 Thread Erik Hatcher
Excuse the blatant marketing, though for the benefit of the community...

http://programs.lucidworks.com/Solr4032013_signuppage.html

Join me tomorrow/today (March 27) for a webinar on what's new and improved in 
Lucene and Solr 4.

It's the last call to register.   Help me break the webinar system by 
overloading it for the presentation!  (and it'll be recorded and shared 
afterwards with those that registered)

Erik



-- [official description]

Lucene/Solr 4 is a ground breaking shift from previous releases. Solr 4.0 
dramatically improves scalability, performance, reliability, and flexibility. 
Lucene 4 has been extensively upgraded. It now supports  near real-time (NRT) 
capabilities that allow indexed documents to be rapidly visible and searchable. 
Additional Lucene improvements include pluggable scoring, much faster fuzzy and 
wildcard querying, and vastly improved memory usage. 

The improvements in Lucene have automatically made Solr 4 substantially better. 
But Solr has also been considerably improved and magnifies these advances with 
a suite of new “SolrCloud” features that radically improve scalability and 
reliability. 

In this Webinar, you will learn:
  * What are the Key Feature Enhancements of Lucene/Solr 4, including the new 
distributed capabilities of SolrCloud
  * How to Use the Improved Administrative User Interface
  * How Sharding has been improved
  * What are the improvements to GeoSpatial Searches, Highlighting, Advanced 
Query Parsers, Distributed search support, Dynamic core management, Performance 
statistics, and searches for rare values, such as Primary Key

Presenter:
Erik Hatcher, Lucene/Solr Committer and PMC member

Erik Hatcher is the co-author of Lucene in Action as well as co-author of 
Java Development with Ant. Erik has been an active member of the Lucene 
community, a leading Lucene/Solr committer, member of the   Lucene/Solr Project 
Management Committee, member of the Apache Software Foundation as well as a 
frequent invited speaker at various industry events. Erik earned his B.S. in 
Computer Science from University of   Virginia, Charlottesville, VA.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ApacheCon meetup

2013-02-19 Thread Erik Hatcher
I've added a Lucene meetup to the Wednesday night meetup proposed schedule.  
I'm speaking on Wednesday morning.  

Let's get the word spread to the Portland tech community as well, making it a 
good way to bring in folks in the area that may not be also attending ApacheCon.

Erik

On Feb 19, 2013, at 13:35 , Chris Hostetter wrote:

 
 : Subject: ApacheCon meetup
 : 
 : Any other Lucene/Solr enthusiasts attending ApacheCon in Portland next week?
 
 I won't make it to ApacheCon this year (first time in a long time 
 actually) but I'm fairly certain there will be a Lucene MeetUp of some 
 kind -- there always is.
 
 This is usually organized via the ApacheCon wiki, so interested 
 participants should sign up there...
 
 https://wiki.apache.org/apachecon/CommunityEventsNA13
 https://wiki.apache.org/apachecon/ApacheMeetupsNA13
 
 
 -Hoss
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene Revolution conference - May 7-10, Boston

2012-02-24 Thread Erik Hatcher
Lucene Revolution will be here May 9-10 in Boston (with training classes 
offered on May 7-8). Reserve your spot today with Early Bird pricing of $575. 
Committers and accepted speakers are entitled to free admission. The CFP is 
open and we’re actively seeking submissions from the Community.

Submit your proposal by March 9 at the link below:
http://www.lucidimagination.com/devzone/events/submit-your-proposal-lucene-revolution-2012
 .
 
Learn more about the conference at: lucenerevolution.org
 
Hot topics of interest this year are:
 
- Lucene and Solr in the Enterprise (case studies, implementation, return 
on investment, etc.)
- How We Did It Development Case Studies
- Big Data
- Relevance in Practice
- Spatial/Geo search
- Lucene and Solr in the Cloud
- Scalability and Performance Tuning
- Large Scale Search
- Real Time Search
- Data Integration/Data Management
- Tika, Nutch and Mahout
- Faceting and Categorization
- Lucene  Solr for Mobile Applications
- Multi-language Support
- Indexing and Analysis Techniques
- Advanced Topics in Lucene  Solr Development



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Problem using custom-separator in UpdateCSV ( in solr )

2012-01-08 Thread Erik Hatcher
\t doesn't work in my shell as a tab replacement character.  And Solr doesn't 
expand this sort of thing for you.

  $ echo foo\tbar
  foo\tbar

Try a real tab character instead.  Though more realistically you'll be using a 
file instead, so you won't have to be concerned with a shell for this.

This thread really belongs on the solr-user though.

Erik

On Jan 8, 2012, at 04:57 , prasenjit mukherjee wrote:

 I am trying to add document to a slor index via  :
 
 $ curl 
 http://localhost:8983/solr/update/csv?commit=truefieldnames=id,title_sseparator=%09;
 --data Doc1\tTitle1 -H 'Content-type:text/plain; charset=utf-8'
 
 Solr doesn't seem to recognize the \t in the content, and is failing
 with following error :
 
 pProblem accessing /solr/update/csv. Reason:
 preCSVLoader: input=null, line=0,expected 2 values but got 1
   values={'docdocitle',}/pre/phr /ismallPowered by
 Jetty:///small/ibr/
 
 What will be the curl command if I want to use a non-comma separator ?
 
 -Thanks,
 Prasenjit
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Phonetic search with Lucene 3.2

2011-11-09 Thread Erik Hatcher
Solr has, for a long while, included a PhoneticFilter that can leverage several 
different algorithms.  This was pulled down to Lucene, but only for trunk/4.0. 

Maybe use Solr instead?!  ;) 

Erik

On Nov 9, 2011, at 02:29 , Felipe Carvalho wrote:

 Using PerFieldAnalyzerWrapper seems to be working for what I need!
 
 On indexing:
 
PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
 StandardAnalyzer(Version.LUCENE_33));
wrapper.addAnalyzer(nome, new MetaphoneReplacementAnalyzer());
IndexWriterConfig indexWriterConfig = new
 IndexWriterConfig(Version.LUCENE_33, wrapper);
Directory directory = FSDirectory.open(new File(indexPath));
IndexWriter indexWriter = new IndexWriter(directory,
 indexWriterConfig);
 
 On search:
 
Directory directory = FSDirectory.open(new
 File(lastIndexDir(Calendar.getInstance(;
IndexSearcher is = new IndexSearcher(directory);
PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
 StandardAnalyzer(Version.LUCENE_33));
wrapper.addAnalyzer(name, new MetaphoneReplacementAnalyzer());
QueryParser parser = new QueryParser(Version.LUCENE_33, name,
 wrapper);
Query query = parser.parse(expression);
ScoreDoc[] hits = is.search(query, 1000).scoreDocs;
 
 Does anyone know any other phonetic analyzer implementation? I'm using
 MetaphoneReplacementAnalyzer from LIA examples.
 
 I'm looking at lucene-contrib stuff at
 http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I can't
 seem to find other phonetic analyzers.
 
 Thanks!
 
 
 On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher erik.hatc...@gmail.comwrote:
 
 
 On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
 Yes, quite possible, including boosting on exact matches if you want.
 Use
 a BooleanQuery to wrap clauses parsed once with phonetic analysis, and
 once
 without, including fields at indexing time for both too of course.
 
 
 Would it be possible to point to an example where this is done. The best
 example of a BooleanQuery I've found so far is this one:
 
 http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
 
 But I couldn't find a boolean query using different analyzers for
 different
 fields of the attribute.
 
 You could use two different QueryParser instances with different
 analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still need to
 instances in order to have a different default field for each expression.
 But then use the techniques you saw in that article (or in Lucene in
 Action, since you mentioned having that) to combine Query objects into a
 BooleanQuery.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Phonetic search with Lucene 3.2

2011-11-09 Thread Erik Hatcher

On Nov 9, 2011, at 05:11 , Felipe Carvalho wrote:

 Can I use Solr as a lib, like Lucene? My company is not willing to install
 a Solr server... =/

That's too bad.  What's the rationale for that decision?  A large number of big 
big companies are deploying on Solr quite happily.  I just taught a Solr class 
here at ApacheCon with attendees from several recognized company names.  One 
attendee has already scaled to billions of documents!

But yes, you can run Solr as a library using EmbeddedSolrServer.  It operates 
much like Lucene in that regard in terms of having an API to index documents 
and search, with configuration being done with Solr's schema, etc mechanisms.

Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Phonetic search with Lucene 3.2

2011-11-08 Thread Erik Hatcher

On Nov 8, 2011, at 03:58 , Felipe Carvalho wrote:

 One other question: I'm looking at Lucene 3.4 javadocs (
 http://lucene.apache.org/java/3_4_0/api/core/index.html) but I can't find
 MetaphoneReplacementAnalyzer anywhere. Does any one know if this class has
 been removed from lucene-core.

That class is in Lucene in Action's companion code, not Lucene itself.  
Download it from http://www.manning.com/lucene

 My Lucene In Action edition is from 2004, so I'm guessing things kinda
 changed since then.

There's a second edition out now, well worth getting if I do say so myself :)  
(I've learned a lot from reading and re-reading it myself, to be honest - 
thanks MikeM!)

 Now suppose my document had a particular field I don't want to be
 metaphones one the search, for instance, exactName. For example, suppose
 I want to look for all documents which contents phonetically match kool
 kat and exactName match kat but not cat, generating an expression like
 this: exactName:kat AND contents:kool kat.
 
 Is it possible to do this? If so, how would I do it? Can I use specific
 analyzers for each field?

Yes, quite possible, including boosting on exact matches if you want.  Use a 
BooleanQuery to wrap clauses parsed once with phonetic analysis, and once 
without, including fields at indexing time for both too of course.

Erik



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Phonetic search with Lucene 3.2

2011-11-08 Thread Erik Hatcher
Felipe -

Look at the other Lucene JARs available.  lucene-analyzers, I think is where it 
is:

   
http://search.maven.org/#search|ga|1|g%3A%22org.apache.lucene%22%20AND%20v%3A%223.4.0%22

Personally, I'd download Lucene 3.4 release from Apache and use the JARs from 
there.

Erik


On Nov 8, 2011, at 04:16 , Felipe Carvalho wrote:

 Thanks, Erik!
 
 I'm looking at lucene-all javadocs, and there are some interesting classes
 (specifically I'd like to use
 org.apache.lucene.analysis.br.BrazilianAnalyzer). I'm able to find
 lucene-core on http://search.maven.org/, but is there a lucene-all
 published on some maven repo? or should I get those contrib classes out of
 some other dependency?
 
 Thanks!
 
 On Tue, Nov 8, 2011 at 10:06 AM, Erik Hatcher erik.hatc...@gmail.comwrote:
 
 
 On Nov 8, 2011, at 03:58 , Felipe Carvalho wrote:
 
 One other question: I'm looking at Lucene 3.4 javadocs (
 http://lucene.apache.org/java/3_4_0/api/core/index.html) but I can't
 find
 MetaphoneReplacementAnalyzer anywhere. Does any one know if this class
 has
 been removed from lucene-core.
 
 That class is in Lucene in Action's companion code, not Lucene itself.
 Download it from http://www.manning.com/lucene
 
 My Lucene In Action edition is from 2004, so I'm guessing things kinda
 changed since then.
 
 There's a second edition out now, well worth getting if I do say so myself
 :)  (I've learned a lot from reading and re-reading it myself, to be honest
 - thanks MikeM!)
 
 Now suppose my document had a particular field I don't want to be
 metaphones one the search, for instance, exactName. For example,
 suppose
 I want to look for all documents which contents phonetically match kool
 kat and exactName match kat but not cat, generating an expression
 like
 this: exactName:kat AND contents:kool kat.
 
 Is it possible to do this? If so, how would I do it? Can I use specific
 analyzers for each field?
 
 Yes, quite possible, including boosting on exact matches if you want.  Use
 a BooleanQuery to wrap clauses parsed once with phonetic analysis, and once
 without, including fields at indexing time for both too of course.
 
   Erik
 
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Phonetic search with Lucene 3.2

2011-11-08 Thread Erik Hatcher

On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
 Yes, quite possible, including boosting on exact matches if you want.  Use
 a BooleanQuery to wrap clauses parsed once with phonetic analysis, and once
 without, including fields at indexing time for both too of course.
 
 
 Would it be possible to point to an example where this is done. The best
 example of a BooleanQuery I've found so far is this one:
 http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
 
 But I couldn't find a boolean query using different analyzers for different
 fields of the attribute.

You could use two different QueryParser instances with different analyzers.  Or 
use the PerFieldAnalyzerWrapper, though you'll still need to instances in order 
to have a different default field for each expression.  But then use the 
techniques you saw in that article (or in Lucene in Action, since you mentioned 
having that) to combine Query objects into a BooleanQuery.

Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Bet you didn't know Lucene can...

2011-10-25 Thread Erik Hatcher
At the group where I worked at UVa once upon a time, a coworker built Juxta, 
this way cool tool to diff multiple versions of a document visually with heat 
maps and difference-o-meters, and it leverages Lucene analyzers to extract 
words and positions and such.

You can find it here: http://www.juxtasoftware.org/

Erik



On Oct 22, 2011, at 05:11 , Grant Ingersoll wrote:

 Hi All,
 
 I'm giving a talk at ApacheCon titled Bet you didn't know Lucene can... 
 (http://na11.apachecon.com/talks/18396).  It's based on my observation, that 
 over the years, a number of us in the community have done some pretty cool 
 things using Lucene that don't fit under the core premise of full text 
 search.  I've got a fair number of ideas for the talk (easily enough for 1 
 hour), but I wanted to reach out to hear your stories of ways you've (ab)used 
 Lucene and Solr to see if we couldn't extend the conversation to a bit more 
 than the conference and also see if I can't inject more ideas beyond the ones 
 I have.  I don't need deep technical details, but just high level use case 
 and the basic insight that led you to believe Lucene could solve the problem.
 
 Thanks in advance,
 Grant
 
 
 Grant Ingersoll
 http://www.lucidimagination.com
 
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Higher rank for closer matches

2011-09-21 Thread Erik Hatcher
PhraseQuery suffices for the stated requirement of boosting when query terms 
are closer.  A common technique is to incorporate a PhraseQuery with a large 
slop factor of the query terms into the query automatically, which implicitly 
boosts matching documents when the query terms are closer.  A SpanNearQuery 
would work too, but a PhraseQuery might be easier to incorporate and will be 
faster performing.

Erik

On Sep 21, 2011, at 05:31 , Akos Tajti wrote:

 Thanks, I will check SpanNearQuery!
 
 Regards,
 Ákos
 
 
 
 
 On Wed, Sep 21, 2011 at 2:20 PM, Em mailformailingli...@yahoo.de wrote:
 
 Àkos,
 
 have a look at SpanNearQuery. This is what you want.
 If you own the 2nd Edition of Lucene in Action have a look at their
 examples. It illustrates how to combine them with the classical queries.
 
 Regards,
 Em
 
 Am 21.09.2011 13:46, schrieb Akos Tajti:
 Dear List,
 
 for multi term expressions I'd like to add higher rank if the matches are
 closer to each other. For example for the search term like eating the
 string i like eating comes before I like some eating.
 
 Is this possible?
 
 Thanks in advance,
 
 Ákos Tajti
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Higher rank for closer matches

2011-09-21 Thread Erik Hatcher
SpanNearQuery does more work than PhraseQuery - it keeps track of all matching 
spans, whereas PhraseQuery does not.  Whether the performance difference will 
be relevant depends on your environment and data - so it may not be a big deal 
at all.

Erik


On Sep 21, 2011, at 10:44 , Em wrote:

 Hi Erik,
 
 could you explain why PhraseQuery performs better than SpanNearQuery?
 
 Some time has passed since I read about it, however I think it was
 exactly the other way round.
 
 Thanks!
 
 Em
 
 Am 21.09.2011 15:56, schrieb Erik Hatcher:
 PhraseQuery suffices for the stated requirement of boosting when query terms 
 are closer.  A common technique is to incorporate a PhraseQuery with a large 
 slop factor of the query terms into the query automatically, which 
 implicitly boosts matching documents when the query terms are closer.  A 
 SpanNearQuery would work too, but a PhraseQuery might be easier to 
 incorporate and will be faster performing.
 
  Erik
 
 On Sep 21, 2011, at 05:31 , Akos Tajti wrote:
 
 Thanks, I will check SpanNearQuery!
 
 Regards,
 Ákos
 
 
 
 
 On Wed, Sep 21, 2011 at 2:20 PM, Em mailformailingli...@yahoo.de wrote:
 
 Àkos,
 
 have a look at SpanNearQuery. This is what you want.
 If you own the 2nd Edition of Lucene in Action have a look at their
 examples. It illustrates how to combine them with the classical queries.
 
 Regards,
 Em
 
 Am 21.09.2011 13:46, schrieb Akos Tajti:
 Dear List,
 
 for multi term expressions I'd like to add higher rank if the matches are
 closer to each other. For example for the search term like eating the
 string i like eating comes before I like some eating.
 
 Is this possible?
 
 Thanks in advance,
 
 Ákos Tajti
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Higher rank for closer matches

2011-09-21 Thread Erik Hatcher
SpanNearQuery is a different kind of beast than PhraseQuery... it matches when 
it's nested SpanQuery's are in proximity.  So it is like multiple PhraseQuery's 
and checking proximities of those with one another... or proximity with any 
other type of SpanQuery.


On Sep 21, 2011, at 11:08 , Em wrote:

 Thanks, Erik.
 If PhraseQuery does not keep track of all matching spans, how does it do
 its work (in comparison to SpanNearQuery)?
 
 Regards,
 Em
 
 Am 21.09.2011 19:52, schrieb Erik Hatcher:
 SpanNearQuery does more work than PhraseQuery - it keeps track of all 
 matching spans, whereas PhraseQuery does not.  Whether the performance 
 difference will be relevant depends on your environment and data - so it may 
 not be a big deal at all.
 
  Erik
 
 
 On Sep 21, 2011, at 10:44 , Em wrote:
 
 Hi Erik,
 
 could you explain why PhraseQuery performs better than SpanNearQuery?
 
 Some time has passed since I read about it, however I think it was
 exactly the other way round.
 
 Thanks!
 
 Em
 
 Am 21.09.2011 15:56, schrieb Erik Hatcher:
 PhraseQuery suffices for the stated requirement of boosting when query 
 terms are closer.  A common technique is to incorporate a PhraseQuery with 
 a large slop factor of the query terms into the query automatically, which 
 implicitly boosts matching documents when the query terms are closer.  A 
 SpanNearQuery would work too, but a PhraseQuery might be easier to 
 incorporate and will be faster performing.
 
Erik
 
 On Sep 21, 2011, at 05:31 , Akos Tajti wrote:
 
 Thanks, I will check SpanNearQuery!
 
 Regards,
 Ákos
 
 
 
 
 On Wed, Sep 21, 2011 at 2:20 PM, Em mailformailingli...@yahoo.de wrote:
 
 Àkos,
 
 have a look at SpanNearQuery. This is what you want.
 If you own the 2nd Edition of Lucene in Action have a look at their
 examples. It illustrates how to combine them with the classical queries.
 
 Regards,
 Em
 
 Am 21.09.2011 13:46, schrieb Akos Tajti:
 Dear List,
 
 for multi term expressions I'd like to add higher rank if the matches 
 are
 closer to each other. For example for the search term like eating the
 string i like eating comes before I like some eating.
 
 Is this possible?
 
 Thanks in advance,
 
 Ákos Tajti
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



[Commercial training announcement] Lucene training at Lucene EuroCon, Barcelona - Oct. 17,18, 2011

2011-09-12 Thread Erik Hatcher
http://www.lucidimagination.com/blog/2011/09/12/learn-lucene/ - pasted below too

Hi everyone... I'm not usually much on advertising/hyping events where I speak 
and teach, but I'm really interested in drumming up a solid attendance for our 
Lucene training that I'll be teaching at Lucene EuroCon in Barcelona next 
month.  We always fill up the Solr trainings, but we all know that Lucene is 
the heart of Solr and I'm happy to be immersing myself once again at the Lucene 
layer to teach this class.

I'm looking forward to seeing some of you next month at our very exciting 
EuroCon event! - http://2011.lucene-eurocon.org/pages/training#lucene-workshop

Erik



You’re using Solr, or some other Lucene-based search solutions, … or you should 
and will be!  You are (or will be) building your solutions on top of a 
top-notch search library, Apache Lucene.

Solr makes using Lucene easier – you can index a variety of data sources 
easily, pretty much out of the box, and you can easily integrate features such 
as faceting, highlighting, and spellchecking – all without writing Java code. 
And if that’s all you need and it works solidly for you, awesome! You can stop 
reading now and attend one of our other excellent training courses that fit 
your needs. But if you are a tinkerer and want to know what makes Solr shine, 
or if you need some new or improved feature read on…

Deeper down, Lucene is cranking – analyzing, buffering, and indexing your 
documents, merging segments, parsing queries, caching data structures, rapidly 
hopping around an inverted index, computing scores, navigating finite state 
machines, and much more.

So how do you go about learning Lucene deeper? I’d be remiss not to mention 
Lucene in Action, as it’s the most polished and well crafted documentation 
available on the Lucene library. And of course there’s the incredibly vibrant 
and helpful Lucene open source community. Those resources will serve you well, 
but there’s no substitute for live, interactive, personal training to get you 
up to speed fast with best practices.

I’m in the process of overhauling our Lucene training course, that I’ll 
personally be delivering at Lucene EuroCon 2011 in Barcelona next month. This 
new and improved course takes an activity-based approach to learning and using 
Lucene’s API, beginning with the common tasks in building solutions using 
Lucene, whether you’re building directly to Lucene’s API or you’re writing 
custom components for Solr.

One area that I’m particularly jazzed about teaching is “query parsing”, the 
process of taking a user (or machine’s) search request and turning it into the 
appropriate underlying Lucene Query object instance.  Many folks developing 
with Lucene are familiar with Lucene’s QueryParser.  But did you know there are 
a couple of other query parsers with special powers?  There’s the surround 
query parser, enabling sophisticated proximity SpanQuery clauses.  And there’s 
the mysterious “XML query parser” (don’t let the ugly sounding name dissuade 
you) that slots dynamic query parameters, such as coming from an “advanced 
search” request, into a tree structured query template.   There’s some more 
insight into the world of Lucene query parsers an “Exploring Query Parsers” 
blog post.

What about all the Lucene contrib modules activity in the Lucene 3.x releases?  
 Here’s a bit of the goodnesses: better Unicode handling with the ICU 
tokenizers and filters, improved stemming, and many other analysis 
improvements, field grouping/collapsing, and block join/query for handling 
particular parent/child relationships.

Come learn the latest about the amazing Lucene library at Lucene EuroCon!  You, 
your boss, and your projects will all be glad you did.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Help needed on Ant build script for creating Lucene index

2011-05-12 Thread Erik Hatcher
There's an example build file, see 
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/ant/example.xml

It's pretty outdated stuff there though.  It has some flexibility for a custom 
document handler in order to allow full control over how a File gets turned 
into a Lucene Document.   By default it'll handle text documents (.txt) and 
.html files using jtidy.

You're better off these days, IMO, to use Solr and its Tika integration for 
creating an index from rich files.

Erik


On May 12, 2011, at 01:04 , Saurabh Gokhale wrote:

 Hi,
 
 Can someone pls direct me to an example where I can get help on creating ant
 build script for creating lucene index?. It is part of Lucene contrib but I
 did not get much idea from the documentation on Lucene site.
 
 Thanks
 
 Saurabh


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



[infomercial] Lucene Refcard at DZone

2011-03-29 Thread Erik Hatcher
I've written an Understanding Lucene refcard that has just been published at 
DZone.  See here for details:

   
http://www.lucidimagination.com/blog/2011/03/28/understanding-lucene-by-erik-hatcher-free-dzone-refcard-now-available/

If you're new to Lucene or Solr, this refcard will be a nice grounding in the 
fundamental concepts.  For you old timers, pass it on to your friends and 
coworkers :)

Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Sort results by number of document fields

2011-01-31 Thread Erik Hatcher

On Jan 31, 2011, at 10:51 , Azhar Jassal wrote:
 How can I use Lucene to sort search results by the number of fields each
 document has? (highest to lowest - documents with more fields in my index
 are better results)

When you know you need to query on something you have available during indexing 
time, make your life easy and index it!  In other words, index the number of 
(other) fields as a numeric into a num_fields field or something like that.

This could be done automatically if you were to write a custom update processor 
and add it to the update processing chain, but easy enough to do in most custom 
indexers I've ever come across as well.

 Also my fields are named as URI's i.e. 
 http://www.w3.org/2000/01/rdf-schema#label, how should I form queries with
 field names containing such syntax? How shall I pass them in, escaped?

Good luck on this one... you'll have to contend with escaping (with a 
backslash) if you're using the lucene query parser, and perhaps other related 
headaches with other query parsers depending on how they do their thing 
underneath.

Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



NoVA/DC - Lucene/Solr Meetup - Wednesday, Nov. 17

2010-11-15 Thread Erik Hatcher
We still have some open spots for the meetup we're hosting this Wednesday night 
in DC.  Come on out, it'll be a great time.

Erik

http://www.lucidimagination.com/blog/2010/11/01/nova-dc-apache-lucenesolr-meetup-630-pm-et-17-november/




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ApacheCon Meetup in Atlanta

2010-10-18 Thread Erik Hatcher
Count me in for any kind of Lucene/Solr hanging out in Atlanta.

Erik


On Oct 18, 2010, at 14:57 , Grant Ingersoll wrote:

 Is there interest in having a Meetup at ApacheCon?  Who's going?  Would 
 anyone like to present?  We could do something less formal, too, and just 
 have drinks and QA/networking.  Thoughts?
 
 -Grant
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Free Webinar: Findability: Designing the Search Experience

2010-08-12 Thread Erik Hatcher

Here's perhaps the coolest webinar we've done to date, IMO :)

I attended Tyler's presentation at Lucene EuroCon* and thoroughly  
enjoyed it.  Search UI/UX is a fascinating topic to me, and really  
important to do well for the applications most of us are building.


I'm pleased to pass along the blurb below.  See you there!

Erik

* http://lucene-eurocon.org/sessions-track2-day2.html#3



Lucid Imagination presents a free webinar
Wednesday, August 18, 2010 10:00 AM PST / 1:00 PM EST / 19:00 CET
Sign up at http://www.eventsvc.com/lucidimagination/081810?trk=ap

You don't need billions of dollars or users to build a user-friendly  
search application. In fact, studies of how and why people search have  
revealed a set of principles that can  result in happy users who find  
what they're seeking with as little friction as possible -- and help  
you build a better, more successful search application.


Join special guest Tyler Tate, user experience designer at UK-based  
TwigKit Search, for a high-level discussion of key user interface  
strategies for search that can be leveraged with Lucene and Solr. The  
presentation covers:

* Ten things to know about designing the search experience
* When to assume users know what they’re looking for – and when not to
* Navigation/discovery techniques, such as faceted navigation, tag  
clouds, histograms and more
* Practical considerations in leveraging suggestions into search  
interactions


Lucid Imagination presents a free webinar
Wednesday, August 18, 2010 10:00 AM PST / 1:00 PM EST / 19:00 CET
Sign up at http://www.eventsvc.com/lucidimagination/081810?trk=ap

About the presenter: Tyler Tate is co-founder of TwigKit, a UK-based  
company focused on building truly usable interfaces for search. Tyler  
has led user experience design for enterprise applications from CMS to  
CRM, and is the creator of the popular 1KB CSS Grid. Tyler also  
organizes a monthly Enterprise Search Meetup in London, and blogs at  
blog.twigkit.com.


-
Join the Revolution!
Don't miss Lucene Revolution
Lucene  Solr User Conference
Boston | October 7-8 2010
http://lucenerevolution.org
-

This webinar is sponsored by Lucid Imagination, the commercial entity  
exclusively dedicated to Apache Lucene/Solr open source search  
technology. Our solutions can help you develop and deploy search  
solutions with confidence: SLA-based support subscriptions,  
professional training, best practices consulting, along with and value- 
add software and free documentation and certified distributions of  
Lucene and Solr.


Apache Lucene and Apache Solr are trademarks of the Apache  
Software Foundation.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: understanding lucene

2010-08-09 Thread Erik Hatcher

An even better URL:  http://www.manning.com/lucene  :)

Erik


On Aug 8, 2010, at 6:19 AM, Uwe Schindler wrote:


Hi Yakob,

In this mailing list are all the people who wrote this book, making  
such a
suggestion is not a good idea, especially if you need help in  
future. You

cannot get everything for free. If you look through the internet (e.g.
twitter, blogs of authors) you may find Coupon Codes / Promotional  
Codes

(e.g. 25% less). Be sure to buy the second edition, my first link was
incorrect: http://www.manning.com/hatcher3

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Yakob [mailto:jacob...@opensuse-id.org]
Sent: Sunday, August 08, 2010 11:55 AM
To: java-user@lucene.apache.org
Subject: Re: understanding lucene

On 8/8/10, Uwe Schindler u...@thetaphi.de wrote:

The example code you found is very old (seems to be from the Version
1.x of Lucene), and is not working with Version 2.x or 3.x of Lucene
(previously deprecated Hits class is gone in 3.0, static Field
constructors were gone long time ago in 2.0, so you get compilation

errors).


If you want to learn Lucene, buy the Book Lucene in Action - 2nd
Edition, there is everything explained and lots of examples for
everyday use with the newest Version 3.0.2. See
http://www.manning.com/hatcher2/ for ordering the PDF version or  
go to

your local bookstore.


In all cases, if you are new to Lucene don't use version 2.9.x or
earlier, use 3.0.x with its clean API. This makes it easier for

beginners.


Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



the ebook cost 30 dollars,can't I just get the free pirate version

instead?hehe... I
mean if you had the ebook yourself maybe you can email me the pdf  
version

to

my email here.so that it would not cost me money. :-)

or maybe I can find it in rapidshare,maybe there is someone kind  
enough

that

put the book there.
--
http://jacobian.web.id

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



[webinar] Rapid Prototyping Search Applications with Solr

2010-06-07 Thread Erik Hatcher
Marketing blurb below.  My personal hype here... I'm going to be  
showcasing a straightforward document search engine, from files  
through indexing through usable user interface in no time.  Come check  
it out.  There'll be similarities to my EuroCon presentation[1],  
though this will be an entirely custom built (building it as we  
speak!) application for the webinar.


Erik



-

Want to get up and running with Apache Solr quickly and easily?

Join Erik Hatcher, Apache Solr and Lucene committer and co-founder of  
Lucid Imagination, for a workshop on getting started with Solr, the  
Lucene Enterprise Search Server. Erik shows you how to use LucidWorks  
for Solr, to iteratively work your way from raw data to full featured  
search application, complete with features such as faceting,  
highlighting, and spellchecking. In this hands-on technical  
presentation, you'll also learn about


  * Tips for adjusting Solr's schema to more tightly match your needs
  * Powerful prototyping tools for use with Solr
  * Showing your data in a flexible search user interface


http://events.lucidimagination.com/home/listing/tabid/64/listingkey/7/rapid_prototyping_search_applications_with_solr.aspx 





[1] http://lucene-eurocon.org/sessions-track2-day2.html#4

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Free Webinar: Implementing Solr open source search in a .NET and DBMS environment; Thurs 27 May 13:00 GMT (9a EDT)

2010-05-23 Thread Erik Hatcher
I'd like to invite you to tune in to a great talk I saw at Apache  
Lucene Eurocon (www.lucene-eurocon.org) this past week in Prague by Bo  
Raun, of Nordjyske Medier. The talk was on how he discovered Solr and  
introduced it successfully in an IT environment whose strategy  
otherwise totally rests on Microsoft technologies (and other  
traditional commercial solutions, like Citrix, VMware, etc). Bo also  
discusses the contrast between a traditional relational database  
background and the different development outlook he needed, along with  
discoveries, surprises, and lessons learned, from which newcomers to  
Solr from commercial search and DBMS backgrounds might benefit.


Today Nordjyske Medier uses Solr for an editorial archive, and for an  
online yellow page directory, with new plans for Solr technologies in  
the near future to incorporate advanced search features like geosearch  
and Solr integration with an ontology engine.


If your timezone has to wake up too early in the day for this, you can  
always sign up now and view it at your convenience, as it'll be cached  
for playback


You can sign up here: http://bit.ly/bU0LKs
Implementing Solr open source search in a dot-net and DBMS environment;
Thurs 27 May 13:00 GMT (9a EDT)

Thanks, Erik
www.lucidimagination.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: merge results from physically separate hosts

2010-04-26 Thread Erik Hatcher
Solr's distributed search feature is about querying multiple indexes  
and merging the results. Different indexes, but same schema.


Erik

On Apr 25, 2010, at 6:02 AM, Shaun Senecal wrote:


Is there currently a way to take a query, run it on multiple hosts
containing different indexes, then merge the results from each host to
present to the user?  It looks like Solr can handle multiple hosts
supporting the same index, but my case requires each index to be
different.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



[WEBINAR] Practical Search with Solr: Beyond just looking it up

2010-04-21 Thread Erik Hatcher
Below is the official announcement for our exciting upcoming webinar.   
This one is near and dear to my heart, so I'll be eagerly listening  
too, and participating with my experiences as it fits with the flow of  
the webinar.


I'm a card-carrying library geek, and I've had the pleasure of working  
alongside Bess Sadler at the University of Virginia as we began the  
Blacklight[1] project.  Now at Stanford, she and Naomi Dushay continue  
to shine Solr on collections of various types.  And Tom Burton-West  
and the Hathi Trust project have put Lucene and Solr through some very  
heavy challenges, breaking Lucene's limits even! (and ultimately  
working to alleviate them).  The scale of Tom's project is massive in  
scale and importance, with OCR challenges in various languages.   I've  
been blessed to know all three of these folks personally thanks to the  
code4lib community in which I continue to participate.


  [1] http://projectblacklight.org/ - a vibrant open source project  
using Ruby on Rails as an elegant and flexible discovery front-end for  
Solr.


---

Forget the dust jackets and Aunt Shirley in 'Reference': libraries  
today are on the screaming edge of Lucene/Solr implementations. Solr  
applications for library data offer practical, general purpose  
solutions to some of the knottiest search problems: daunting indexes,  
metadata whipped into shape, deep field faceting, data types a mile  
wide, phrase queries that will make your head spin, unforgiving  
relevancy for the masses, even OCR. Join Stanford's Bess Sadler and  
Naomi Dushay, plus Tom Burton-West of the Hathi Trust Project, for a  
walking tour of how Solr can tame the wildest of search challenges,  
including:


• Strategies for dealing with non-normalized data
• Leveraging field data and user experience for improved relevancy
• Tackling big and unexpected phrase queries

Join us for a free webinar
Thursday, April 29, 2010
11:00 AM PDT / 2:00 PM EDT
Go here to sign up: 
http://www.eventsvc.com/lucidimagination/042910?trk=WR-APR2010B-AP

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: lucene webinterface

2010-02-19 Thread Erik Hatcher

Again, try LIMO.

But what do you mean, no success with Solr?  Please elaborate on the  
issues you encountered and what you tried.


Erik

On Feb 19, 2010, at 2:41 PM, luciusvorenus wrote:



no success with solr

Anybody another suggestion ?



luciusvorenus wrote:



I already have the data indexed (a database table) and also  i  
have  class

to search.. just simple
I would like just a search box ...

Thank u


polx wrote:



On 16-févr.-10, at 17:40, luciusvorenus wrote:

how can I build a  webinterface for my aplication ?  I read
something with
HTML table and php but i had no idea?
Can anobody help me?


Lucius,

try solr.

paul
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org








--
View this message in context: 
http://old.nabble.com/lucene-webinterface-tp27611202p27659293.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: lucene webinterface

2010-02-17 Thread Erik Hatcher
Solr can front your Lucene index, and via Solritas[1] it can provide a  
simple and customizable basic UI.


Though to stick with pure Lucene, give LIMO[2] a try.

Erik

[1] 
http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/

[2] http://limo.sourceforge.net/


On Feb 17, 2010, at 5:48 AM, luciusvorenus wrote:
I already have the data indexed (a database table) and also  i have   
class

to search.. just simple
I would like just a search box ...

Thank u


polx wrote:



On 16-févr.-10, at 17:40, luciusvorenus wrote:

how can I build a  webinterface for my aplication ?  I read
something with
HTML table and php but i had no idea?
Can anobody help me?


Lucius,

try solr.

paul
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





--
View this message in context: 
http://old.nabble.com/lucene-webinterface-tp27611202p27621946.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

2010-02-01 Thread Erik Hatcher
QueryParser has a special capability to lowercase wildcard and prefix  
queries, simply because they are not passed to an analyzer.  Term  
queries, phrase queries (like your example), etc are passed on to the  
analyzer.  You are using the KeywordAnalyzer for the title field, and  
thus it is not lowercased.  Choose a different analyzer that  
lowercases and it will.


Erik

On Feb 1, 2010, at 1:10 PM, java8964 java8964 wrote:



I would like to confirm your reply. You mean that the query parse  
will lower casing. In fact, it looks like that it only does this for  
wild card query, right?


For the term query, it didn't. As proved by if you change the line to:

   Query query = new QueryParser(title,  
wrapper).parse(title:\BBB CCC\);


You will get 1 hits back. So in this case, the query parser class  
did in different way for term query and wild card query.


We have to use the query parse in this case, but we have our own  
Query parser class extends from the lucene query parser class.  
Anything we can do to about it?


Will lucense's query parser class be fixed for the above  
inconsistent implementation?


Thanks



From: u...@thetaphi.de
To: java-user@lucene.apache.org
Subject: RE: During the wild card search, will lucene 2.9.0 to  
convert the search string to lower case?

Date: Mon, 1 Feb 2010 17:41:08 +0100

Only query parser does the lower casing. For such a special case, I  
would suggest to use a PrefixQuery or WildcardQuery directly and  
not use query parser.


-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-Original Message-
From: java8964 java8964 [mailto:java8...@hotmail.com]
Sent: Monday, February 01, 2010 5:27 PM
To: java-user@lucene.apache.org
Subject: During the wild card search, will lucene 2.9.0 to convert  
the

search string to lower case?


I noticed a strange result from the following test case. For  
wildcard

search, my understanding is that lucene will NOT use any analyzer on
the query string. But as the following simple code to show, it looks
like that lucene will lower case the search query in the wildcard
search. Why? If not, why the following test case show the search  
hits
as one for lower case wildcard search, but not for the upper case  
data?

My original data is NOT analyzed, so they should be stored as the
original data in the index segment, right?

Lucene version: 2.9.0

JDK version: JDK 1.6.0_17


public class IndexTest1 {
   public static void main(String[] args) {
   try {
   Directory directory = new RAMDirectory();
   IndexWriter writer = new IndexWriter(directory, new
StandardAnalyzer(Version.LUCENE_CURRENT),
IndexWriter.MaxFieldLength.UNLIMITED);
   Document doc = new Document();
   doc.add(new Field(title, BBB CCC, Field.Store.YES,
Field.Index.NOT_ANALYZED));
   writer.addDocument(doc);
   doc = new Document();
   doc.add(new Field(title, ddd eee, Field.Store.YES,
Field.Index.NOT_ANALYZED));
   writer.addDocument(doc);

   writer.close();

   IndexSearcher searcher = new IndexSearcher(directory,
true);
   PerFieldAnalyzerWrapper wrapper = new
PerFieldAnalyzerWrapper(new  
StandardAnalyzer(Version.LUCENE_CURRENT));

   wrapper.addAnalyzer(title, new KeywordAnalyzer());
   Query query = new QueryParser(title,
   wrapper).parse(title:BBB*);
   System.out.println(hits of title =  +
searcher.search(query, 100).totalHits);
   query = new QueryParser(title,
   wrapper).parse(title:ddd*);
   System.out.println(hits of title =  +
searcher.search(query, 100).totalHits);
   searcher.close();
   } catch (Exception e) {
   System.out.println(e);
   }
   }
}

The output:
hits of title = 0
hits of title = 1


_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/201469227/direct/01/



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



_
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Question about relevance

2010-01-08 Thread Erik Hatcher
One technique I've seen commonly used is to index both stemmed and  
unstemmed fields, and during search query both and boost the unstemmed  
field matches higher.


Erik

On Jan 8, 2010, at 4:05 AM, Yannick Caillaux wrote:


Hi,

I index 2 documents. the first contains the word Wallis in the  
title field. The second has the same title but Wallis is replaced  
by Wall.

I execute the query : title:wallis
During the search, Wallis is cut by the FrenchAnalyzer and becomes  
wall. So the two documents are results for the search.


My problem is : the two results have the same relevance.
I thought that the document containing Wallis would have better  
relevance because I search for the word wallis and not wall.


Relevance is calculated from the searched word (wallis) or from the  
analyzed word (wall)? Is there any solution to get better relevance  
for the result wallis ?

For information i'm in lucene 2.3.2.

Thanks

Yannick



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Migrating to Open-Source Search with Lucene and Solr/ Free Webinar 8-Dec 2pm ET/11amPT/1900 GMT

2009-12-02 Thread Erik Hatcher

WEBINAR:
Hosted by KMWorld and featuring speakers from The Motley Fool and  
Lucid Imagination

Tuesday, Dec 8: 2pm ET/11amPT/1900 GMT
Sign up here: http://www.kmworld.com/webinars/lucid/08dec2009/luc3

Greetings,

I'll be presenting along with some of our customers from Motley Fool  
at a free webinar, sponsored by Lucid Imagination, entitled Take  
Control of Your Search Destiny: Migrating to Apache Solr/Lucene Open  
Source Search .


I'll be joined by Motley Fool Director of Search Danny Hsia and Motley  
Fool Search VP Tech Operation Chad Wolfsheimer, to discuss how The  
Motley Fool took control of their search technology and migrated to  
Lucene/Solr Open Source Search – and how they dramatically improved  
search relevancy, speed, versatility and costs.


You're welcome to join or to forward this to someone who might like to.

Thanks,
Erik
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Webinar: Apache Solr 1.4 – Faster, Easier, an d More Versatile than Ever

2009-09-30 Thread Erik Hatcher

Excuse the cross-posting and gratuitous marketing :)

Erik


My company, Lucid Imagination, is sponsoring a free and in-depth  
technical webinar with Erik Hatcher, one of our co-founders as Lucid  
Imagination, as well as co-author of Lucene in Action, and Lucene/Solr  
PMC member and committer. Sign up here: http://www.eventsvc.com/lucidimagination/100909?trk=WR-OCT2009-AP


Friday, October 9th 2009
10:00AM – 11:00AM PDT / 1:00 – 2:00PM EDT

If you’ve got a lot of data to tame in a variety of formats, there’s  
no better, deeper, faster platform to build your search application  
with than Solr. Apache Solr 1.4 expands the power and versatility of  
the leading open source search server, with its convenient web- 
services interfaces and well-packaged server implementation. Erik will  
present and discuss key features and innovations of Solr 1.4,  
covering, among others:


  * Faster, more streamlined document and query processing
  * New powerful search methods including multi-select faceting,  
deduplication and numeric range handling
  * Simplified, powerful, highly-scalable deployment improvements  
with new Java server infrastructure


Sign up for the free webinar at
http://www.eventsvc.com/lucidimagination/100909?trk=WR-OCT2009-AP

About the presenter:
Erik Hatcher, is the co-author of “Lucene in Action” as well as co- 
author of “Java Development with Ant”. Erik has been an active member  
of the Lucene community – a leading Lucene and Solr committer, member  
of the Lucene Project Management Committee, member of the Apache  
Software Foundation as well as a frequent invited speaker at various  
industry events.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Free Webinar - Apache Lucene 2.9: Technical Overview of New Features

2009-09-18 Thread Erik Hatcher

Free Webinar: Apache Lucene 2.9: Discover the Powerful New Features
---

Join us for a free and in-depth technical webinar with Grant  
Ingersoll, co-founder of Lucid Imagination and chair of the Apache  
Lucene PMC.


Thursday, September 24th 2009
11:00AM - 12 NOON PDT / 2:00 - 3:00PM EDT
Click on the link below to sign up
http://www.eventsvc.com/lucidimagination/092409?trk=WR-SEP2009B-AP

Lucene 2.9 offers a rich set of new features and performance  
improvements alongside plentiful fixes and optimizations. If you are a  
Java developer building search applications with the Lucene search  
library, this webinar provides the insights you need to harness this  
important update to Apache Lucene.


Grant will present and discuss key technical features and innovations  
including:

o Real time/Per segment searching and caching
o Built in numeric range support with trie structure for speed and  
simplified programming

o Reduced search latency and improved index efficiency

Join us for a free webinar.
Thursday, September 24th 2009
11:00 AM - NOON PDT / 2:00 - 3:00 PM EDT
http://www.eventsvc.com/lucidimagination/092409?trk=WR-SEP2009B-AP

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene in Action Rev2

2009-08-26 Thread Erik Hatcher

I've pinged Manning to get this corrected.  Thanks for the heads-up.

Erik

On Aug 26, 2009, at 5:58 PM, tsuraan wrote:


In the free first chapter of the new Lucene in Action book, it states
that it's targetting Lucene 3.0, but on the Manning page for the book,
it says the code in the book is written for 2.3.  I'm guessing that
the book is the authority on what the book covers, but could somebody
maybe change the Manning page to reflect that?

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene Meetup - September 3, Mountain View, CA

2009-08-25 Thread Erik Hatcher

Announcing a new Meetup for SFBay Apache Lucene/Solr Meetup!

What: SFBay Apache Lucene/Solr June Meetup
When: September 3, 2009 6:30 PM
Where: Computer History Museum, 1401 N Shoreline Blvd, Mountain View,  
CA 94043


Presentations and discussions on Lucene/Solr, the Apache Open Source  
Search Engine/Platform -- featuring:
	• Lucene Search Performance Analysis: Andrzej Bialecki, Nutch  
Committer / Luke author
	• Can you find what they found? Solr @ Digg.com: Sammy Yu, Digg.com  
Search Development

• Looking at Solr Relevancy: Mark Bennett, New Idea Engineering
	• Search at Netflix and beyond: Walter Underwood, Search Veteran-- 
Infoseek, Verity, Netflix
	• Innovations in search and social media: Brian Pinkerton, Chief  
Architect, Lucid Imagination

More talks posted shortly!

Presentations followed by Lightning Talks from community members.  
Lightning talks open for registration soon.


We'll have some food and beverages. Questions? Contact 
ta...@lucidimagination.com

Learn more here: 
http://www.meetup.com/SFBay-Lucene-Solr-Meetup/calendar/11157028/
See other meetups at 
http://www.lucidimagination.com/Community/Marketplace/Meetups
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Group by in Lucene ?

2009-08-02 Thread Erik Hatcher

Don't overlook Solr: http://lucene.apache.org/solr

Erik

On Aug 1, 2009, at 5:43 AM, mschipperheyn wrote:



http://code.google.com/p/bobo-browse

looks like it may be the ticket.

Marc

--
View this message in context: 
http://www.nabble.com/Group-by-in-Lucene---tp13581760p24767693.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: dbsight

2009-04-30 Thread Erik Hatcher


On Apr 30, 2009, at 10:32 PM, Michael Masters wrote:
Sweet! I'll look more into solr. I wasn't under the impression solr  
could index a database like dbsight.


It's not point-and-clickable, but Solr's DataImportHandler has  
sophisticated configuration capabilities for indexing any JDBC  
accessible database.


And there is also the LuSql project that has recently gotten a lot of  
good press, and I've seen it demo'd first hand it's quite powerful and  
flexible.


Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Solr webinar

2009-04-20 Thread Erik Hatcher

(excuse the cross-post)

I'm presenting a webinar on Solr.  Registration is limited, so sign up  
soon.  Looking forward to seeing some of you there!


Thanks,
Erik


Got data? You can build your own Solr-powered Search Engine!

Erik Hatcher, Lucene/Solr Committer and author, will show you how you  
how to use Solr to build an Enterprise Search engine that indexes a  
variety data sources all in a matter of minutes!


Thursday, April 30, 2009
11:00AM - 12:00PM PDT / 2:00PM - 3:00PM EDT

Sign up for this free webinar today at
http://www2.eventsvc.com/lucidimagination/?trk=E1

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ebook resources - including lucene in action

2009-04-20 Thread Erik Hatcher
It is not legal to share purchased e-books in this manner.  Please  
purchase copies of the books you read, otherwise authors have very  
little incentive to dedicate months (14 months in the case of Lucene  
in Action, first edition) of their lives to writing this content.


Erik

On Apr 20, 2009, at 1:58 AM, Saurabh Bhutyani wrote:

Check out this site: www.downloadsearchengine.comIt allows to search  
and download pdf ebooks, ppts, doc, mp3, torrents, rapidshare links  
etc. Original message From:wu fuheng wufuh...@gmail.com Date: 20  
Apr 09 09:28:56Subject:ebook resources  including lucene in  
actionTo: nutchu...@lucene.apache.orgwelcome to download 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Rexex Search

2009-04-13 Thread Erik Hatcher


On Apr 13, 2009, at 5:41 AM, Seid Mohammed wrote:

I want to include Regular Expresion based searching to my lucene  
appplication

Anyone who can help?


There is a RegexQuery and a SpanRegexQuery available in Lucene's regex  
contrib:


http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/regex/package-summary.html 



The test cases show some example usages:
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/regex/src/test/org/apache/lucene/search/regex/TestRegexQuery.java 



Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Sequential match query

2009-04-12 Thread Erik Hatcher


On Apr 11, 2009, at 9:11 PM, Tim Williams wrote:


On Sat, Apr 11, 2009 at 12:25 PM, Erick Erickson
erickerick...@gmail.com wrote:

That'll teach me to scan a post. The link I sent you
is still relevant, but wildcards are NOT intended to be used to
concatenate terms. You want a phrase query or a span query
for that. i.e. A C F~# where # is the slop, that is, the number
of other terms allowed to appear between your desired terms.

SpanQueries are constructed programmatically, and PhraseQueries
are produced by the parser.


As I understand it though, there's no way to use the queryparser to
construct an *ordered* phrase query with slop (which is what I think
he's after), right?  I gather that'd have to be done manually with a
SpanNearQuery.  I'd love to hear that the query parser has syntax for
this though...


QueryParser does not create any SpanQuery's, but one can subclass  
QueryParser and override getFieldQuery() to put in a SpanQuery instead  
of a PhraseQuery.


Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Sequential match query

2009-04-12 Thread Erik Hatcher


On Apr 12, 2009, at 8:15 AM, Tim Williams wrote:


On Sun, Apr 12, 2009 at 5:56 AM, Erik Hatcher
e...@ehatchersolutions.com wrote:


On Apr 11, 2009, at 9:11 PM, Tim Williams wrote:


On Sat, Apr 11, 2009 at 12:25 PM, Erick Erickson
erickerick...@gmail.com wrote:


That'll teach me to scan a post. The link I sent you
is still relevant, but wildcards are NOT intended to be used to
concatenate terms. You want a phrase query or a span query
for that. i.e. A C F~# where # is the slop, that is, the number
of other terms allowed to appear between your desired terms.

SpanQueries are constructed programmatically, and PhraseQueries
are produced by the parser.


As I understand it though, there's no way to use the queryparser to
construct an *ordered* phrase query with slop (which is what I think
he's after), right?  I gather that'd have to be done manually with a
SpanNearQuery.  I'd love to hear that the query parser has syntax  
for

this though...


QueryParser does not create any SpanQuery's, but one can subclass
QueryParser and override getFieldQuery() to put in a SpanQuery  
instead of a

PhraseQuery.


Thanks Erik, wouldn't this be an all-or-nothing replacement though?
In other words, by creating ordered SpanNearQuery's as the override,
wouldn't he loose the current unordered PhraseQuery+slop
functionality?  I haven't seen a way to subclass the QueryParser to
support both (e.g. extend the syntax)?


As always, it depends.  If the QueryParser subclass has a switch to  
toggle between SpanNearQuery and PhraseQuery it could controlled by  
the code which way to go.  But yeah, it's not currently possible to  
extend the syntax of QueryParser with a subclass.  There is a nice new  
open issue with a new query parser implementation that is vastly more  
flexible - we'll see that come in to Lucene in the near future.


Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Speed of fuzzy searches

2009-04-03 Thread Erik Hatcher


On Apr 3, 2009, at 10:58 AM, Grant Ingersoll wrote:
Now, we have an implementation of JaroWinkler in the spell checker  
(in fact, we have pluggable distance measures there), perhaps it  
makes sense to think about how FuzzyQuery could leverage this  
pluggability?


My suggestion is to make it pluggable like the RegexQuery makes the  
regular expression engine pluggable.  *cough* interfaces ;)  ala  
RegexCapabilities and RegexQueryCapable


Erik




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Erik Hatcher
With the caveat that if you're not storing the text you want  
highlighted, you'll have to retrieve it somehow and send it into the  
Highlighter yourself.


Erik

On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote:



You should look at contrib/highlighter, which does exactly this.

Mike

Amin Mohammed-Coleman wrote:


Hi
I am currently indexing documents (pdf, ms word, etc) that are  
uploaded,
these documents can be searched and what the search returns to the  
user are
summaries of the documents.  Currently the summaries are extracted  
when
indexing the file (summary constructed by taking the first 10 lines  
of the
document and stored in the index as field).  This is not ideal  
(static
summary), and I was wondering if it would be possible to create a  
dynamic
summary when a hit is found and highlight the terms found.  The  
content of

the document is not stored in the index.

So basically what I'm looking to do is:

1) PDF indexed
2) PDF body contains the word search
3) Do a search and return the hit
4) Construct a summary with the term search included.

I'm not sure how to go about doing this (I presume it is  
possible).  I would

be grateful for any advice.


Cheers
Amin



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Erik Hatcher

It depends :)

It's a trade-off.  If storing is not prohibitive, I recommend that as  
it makes life easier for highlighting.


Erik

On Mar 7, 2009, at 6:37 AM, Amin Mohammed-Coleman wrote:


hi
that's what i was thinking about.  i would need to get the file and  
extract
the text again and then pass through the highlighter.  The other  
option is
storing the content in the index the downside being index is going  
to be

large.  Which would be the recommended approach?

Cheers

Amin

On Sat, Mar 7, 2009 at 10:50 AM, Erik Hatcher e...@ehatchersolutions.com 
wrote:


With the caveat that if you're not storing the text you want  
highlighted,

you'll have to retrieve it somehow and send it into the Highlighter
yourself.

  Erik


On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote:



You should look at contrib/highlighter, which does exactly this.

Mike

Amin Mohammed-Coleman wrote:

Hi
I am currently indexing documents (pdf, ms word, etc) that are  
uploaded,
these documents can be searched and what the search returns to  
the user

are
summaries of the documents.  Currently the summaries are  
extracted when
indexing the file (summary constructed by taking the first 10  
lines of

the
document and stored in the index as field).  This is not ideal  
(static
summary), and I was wondering if it would be possible to create a  
dynamic
summary when a hit is found and highlight the terms found.  The  
content

of
the document is not stored in the index.

So basically what I'm looking to do is:

1) PDF indexed
2) PDF body contains the word search
3) Do a search and return the hit
4) Construct a summary with the term search included.

I'm not sure how to go about doing this (I presume it is  
possible).  I

would
be grateful for any advice.


Cheers
Amin




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Learning Lucene

2009-03-05 Thread Erik Hatcher


On Mar 5, 2009, at 9:24 AM, Tuztuz T wrote:

dear all
I am really new to lucene
Is there anyone who can guid me learning lucene
I have lucene in action the old book, but I get hard time to  
understand the syntaxes in the book and the new lucene release (2.4)
Can anyone give me copy of the new lucen inaction book or any other  
material that i can go thru.


The second edition is available through Manning's MEAP program  
already. Still some writing left to do on it, and hopefully 2.9 will  
be out first, before it goes to print, but it has been updated to the  
latest API and contains lots of great new material primarily thanks to  
Mike McCandless.


   http://www.manning.com/hatcher3/

Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Confidence scores at search time

2009-03-04 Thread Erik Hatcher


On Mar 4, 2009, at 9:05 AM, Michael McCandless wrote:



I think (?) Explanation.toString() is in fact supposed to return the  
full explanation (not just the first line)?


You're right... I just read the code wrong after seeing the output Ken  
posted originally.


He followed up with a correction:
 http://www.lucidimagination.com/search/document/52363ad81237162f/confidence_scores_at_search_time 



Sorry 'bout that!

Erik





Mike

Ken Williams wrote:





On 3/2/09 1:58 PM, Erik Hatcher e...@ehatchersolutions.com wrote:



On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:
In the output, I get explanations like 0.88922405 = (MATCH)  
product

of:
with no details.  Perhaps I need to do something different in
indexing?


Explanation.toString() only returns the first line.  You can use
toString(int depth) or loop over all the getDetails().   toHtml()
returns a decently formatted tree of ul's of the whole explanation
also.


It looks like toString(int) is a protected method, and toHtml()  
only seems
to return a single ul with no content.  I can start writing a  
recursive

routine to dive down into getDetails(), but I thought there must be
something easier.

--
Ken Williams
Research Scientist
The Thomson Reuters Corporation
Eagan, MN


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Luke site is down?

2009-03-04 Thread Erik Hatcher


On Mar 4, 2009, at 2:08 PM, Ruslan Sivak wrote:
Is there a separate mailing list for getopt?  Perhaps someone can  
notify the site owner?


I've just sent Andrzej Luke Bialecki an e-mail, though I imagine he  
monitors this list too.


Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Confidence scores at search time

2009-03-02 Thread Erik Hatcher


On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:
Finally, I seem unable to get Searcher.explain() to do much useful -  
my code

looks like:

   Searcher searcher = new IndexSearcher(reader);
   QueryParser parser = new QueryParser(LuceneIndex.CONTENT,  
analyzer);

   Query query = parser.parse(queryString);
   TopDocCollector collector = new TopDocCollector(n);
   searcher.search(query, collector);

   for ( ScoreDoc d : collector.topDocs().scoreDocs ) {
   String explanation = searcher.explain(query,  
d.doc).toString();
   Field id =  
searcher.doc( d.doc ).getField( LuceneIndex.ID );
   System.out.println(id + \t + d.score + \t +  
explanation);

   }

In the output, I get explanations like 0.88922405 = (MATCH) product  
of:
with no details.  Perhaps I need to do something different in  
indexing?


Explanation.toString() only returns the first line.  You can use  
toString(int depth) or loop over all the getDetails().   toHtml()  
returns a decently formatted tree of ul's of the whole explanation  
also.


Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: How build Lucene in Action examples

2009-02-28 Thread Erik Hatcher
Please post questions/issues related to Lucene in Action to Manning's  
Author Online forum at:


   http://www.manning-sandbox.com/forum.jspa?forumID=451

Thanks,
Erik


On Feb 27, 2009, at 6:33 PM, tolkienGR wrote:



Hi !!!
I'm new in Lucene.I started reading Lucene in action (first  
edition) , i

downloaded the code from
http://www.manning.com/hatcher2/ .
I read somewhere that that code with written in an old of lucene and i
should download the code from the new version from here:
http://www.manning.com/hatcher3/
And so i did...
I still have some problem.For example in the file Searcher.java i  
get this

error:
Directory fsDir = new FSDirectory(indexDir, null); -- cannot find  
sybol

construct FSDirectory(... )

I also tried FSDirectory.getDirecory(indexDir, null);

I have tried many versions of Lucene.Any helpPlease!
Thanks
--
View this message in context: 
http://www.nabble.com/How-build-Lucene-in-Action-examples-tp22256503p22256503.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Indexer.Java problem

2009-02-21 Thread Erik Hatcher
Also, the first several hits here provide the tricks to update the  
code to the latest API:


   http://www.lucidimagination.com/search/?q=lucene+in+action+examples+update 
  :)


Erik


On Feb 19, 2009, at 10:41 AM, Seid Mohammed wrote:


I am using netbeans on windows to test lucene.
I have added all the lib files from the /lib directory to my project  
library.

down the end of Indexer.java program, it states the Field.Text method
is not available
the error message is as follows
---

C:\backup\msc\year2sem1\JavSrc\Lucene\src\Indexer.java:18: duplicate
class: lia.meetlucene.Indexer
public class Indexer {
C:\backup\msc\luceninaction\LuceneInAction\src\lia\meetlucene 
\Indexer.java:80:

cannot find symbol
symbol  : method Text(java.lang.String,java.io.FileReader)
location: class org.apache.lucene.document.Field
   doc.add(Field.Text(contents, new FileReader(f)));
C:\backup\msc\luceninaction\LuceneInAction\src\lia\meetlucene 
\Indexer.java:81:

cannot find symbol
symbol  : method Keyword(java.lang.String,java.lang.String)
location: class org.apache.lucene.document.Field
   doc.add(Field.Keyword(filename, f.getCanonicalPath()));
Note: C:\backup\msc\luceninaction\LuceneInAction\src\lia\meetlucene 
\Indexer.java

uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
3 errors
BUILD FAILED (total time: 2 seconds)
---
what is wrong?
it underlines in red for the folowing code
=
   Document doc = new Document();
   doc.add(Field.Text(contents, new FileReader(f)));
   doc.add(Field.Keyword(filename, f.getCanonicalPath()));
   writer.addDocument(doc);
===

seid m
--
RABI ZIDNI ILMA

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-08 Thread Erik Hatcher


On Feb 8, 2009, at 3:32 AM, Raffaella Ventaglio wrote:


Hi Chris,

The SortedVIntList approach is similar to field cache. It's better  
to use
the fieldcache for the facet search, which is the normal approach  
and

used
in tools like Solr, DBSight, Bobo Browse Engine, etc.



Thanks for your answer, I did not know about FieldCache.
However, I think I cannot use it to solve my problem because, as I  
said in
my previous post, a lot of my facets are not related to a value  
on a
single field, but can be configured by the user by writing a complex  
boolean

query.


And this is also the reason why I think I cannot use Solr to  
implement this kind of faceted search.


Solr also supports facet queries... such that a count of matching  
documents within a constrained subset is returned for each facet.query  
provided.


Erik

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Registration for ApacheCon Europe 2009 is now open!

2009-01-29 Thread Erik Hatcher
Cross-posting this announcement.  There are several relevant Lucene/ 
Solr talks including:


Trainings
  - Lucene Boot Camp (Grant Ingersoll)
  - Solr Boot Camp (Erik Hatcher)

Sessions
  - Introducing Apache Mahout (Grant)
  - Lucene Case Studies (Erik)
  - Advanced Indexing Techniques with Apache Lucene (Michael Busch)

And a whole slew of Hadoop/cloud coverage.

Erik




--

ApacheCon EU 2009 registration is now open!
23-27 March -- Mövenpick Hotel, Amsterdam, Netherlands
http://www.eu.apachecon.com/


Registration for ApacheCon Europe 2009 is now open - act before early
bird prices expire 6 February.  Remember to book a room at the Mövenpick
and use the Registration Code: Special package attendees for the
conference registration, and get 150 Euros off your full conference
registration.

Lower Costs - Thanks to new VAT tax laws, our prices this year are 19%
lower than last year in Europe!  We've also negotiated a Mövenpick rate
of a maximum of 155 Euros per night for attendees in our room block.

Quick Links:

  http://xrl.us/aceu09sp  See the schedule
  http://xrl.us/aceu09hp  Get your hotel room
  http://xrl.us/aceu09rp  Register for the conference

Other important notes:

- Geeks for Geeks is a new mini-track where we can feature advanced
technical content from project committers.  And our Hackathon on Monday
and Tuesday is open to all attendees - be sure to check it off in your
registration.

- The Call for Papers for ApacheCon US 2009, held 2-6 November
2009 in Oakland, CA, is open through 28 February, so get your
submissions in now.  This ApacheCon will feature special events with
some of the ASF's original founders in celebration of the 10th
anniversary of The Apache Software Foundation.

  http://www.us.apachecon.com/c/acus2009/

- Interested in sponsoring the ApacheCon conferences?  There are plenty
of sponsor packages available - please contact Delia Frees at
de...@apachecon.com for further information.

==
ApacheCon EU 2008: A week of Open Source at it's best!

Hackathon - open to all! | Geeks for Geeks | Lunchtime Sessions
In-Depth Trainings | Multi-Track Sessions | BOFs | Business Panel
Lightning Talks | Receptions | Fast Feather Track | Expo... and more!

- Shane Curcuru, on behalf of
 Noirin Shirley, Conference Lead,
 and the whole ApacheCon Europe 2009 Team
 http://www.eu.apachecon.com/  23-27 March -- Amsterdam, Netherlands



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: indexing binary files?

2009-01-29 Thread Erik Hatcher
Are these some type of parsable-into-text binary files that you have a  
parser handy for?


Erik

On Jan 29, 2009, at 10:43 PM, Paul Feuer wrote:


Hi -

I've looked on the FAQ, the Java Docs, and searched a little in
google, but haven't been able to figure out if Lucene can index binary
files.

Our binary files can get up into the 20-30 gigabyte range.

If it is possible, anyone have any pointers to what interfaces I  
should look at?


Thanks,

./paul

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Fwd: [Travel Assistance] Applications for ApacheCon EU 2009 - Now Open

2009-01-23 Thread Erik Hatcher



Begin forwarded message:


From: Tony Stevenson pct...@apache.org
Date: January 23, 2009 8:28:19 AM EST
To: travel-assista...@apache.org
Subject: [Travel Assistance] Applications for ApacheCon EU 2009 -  
Now Open




The Travel Assistance Committee is now accepting applications for  
those
wanting to attend ApacheCon EU 2009 between the 23rd and 27th March  
2009

in Amsterdam.

The Travel Assistance Committee is looking for people who would like  
to

be able to attend ApacheCon EU 2009 who need some financial support in
order to get there. There are very few places available and the  
criteria
is high, that aside applications are open to all open source  
developers

who feel that their attendance would benefit themselves, their
project(s), the ASF or open source in general.

Financial assistance is available for travel, accommodation and  
entrance

fees either in full or in part, depending on circumstances. It is
intended that all our ApacheCon events are covered, so it may be  
prudent
for those in the United States or Asia to wait until an event closer  
to
them comes up - you are all welcome to apply for ApacheCon EU of  
course,
but there must be compelling reasons for you to attend an event  
further
away that your home location for your application to be considered  
above

those closer to the event location.

More information can be found on the main Apache website at
http://www.apache.org/travel/index.html - where you will also find a
link to the online application form.

Time is very tight for this event, so applications are open now and  
will

end on the 4th February 2009 - to give enough time for travel
arrangements to be made.

Good luck to all those that apply.


Regards,
The Travel Assistance Committee
--




--
Tony Stevenson
t...@pc-tony.com  //  pct...@apache.org  // pct...@freenode.net
http://blog.pc-tony.com/

1024D/51047D66 ECAF DC55 C608 5E82 0B5E  3359 C9C7 924E 5104 7D66
--



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Erik Hatcher


On Dec 16, 2008, at 6:57 AM, Oleg Oltar wrote:

Also maybe there are some free manuals/articles that you can  
recommend for

starters?


There's a bunch of stuff listed here: http://wiki.apache.org/lucene-java/Resources 



Lucene has been changing so rapidly lately that I'm not aware of any  
articles that are entirely up-to-date API-wise, but again in general  
most of those API changes are pretty minor and actually well  
documented in Lucene's CHANGES.txt and deprecation warnings (upgrading  
from 1.4 to 1.9, for example, primed users with nice deprecation  
warnings with what was going to be removed).


Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Erik Hatcher


On Dec 16, 2008, at 5:53 AM, Oleg Oltar wrote:
So is there another manual which I can use to start? (Seems that  
examples in
the book, are carefully chosen for starters, and quite easy to  
understand)


The API differences are all quite minor to adjust to the latest -  
hopefully the post I pointed you to will get you over the problems, or  
the new code download.  Feel free to ask with specifics when you  
encounter issues, either here or on the Manning forum for the book.


Erik


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Erik Hatcher
The first edition of Lucene in Action was written for Lucene 1.4.   
Lots has changed since then in the API, but the fundamentals are still  
sound.  The code can be easily updated to the newer API following the  
details I posted here:


   http://markmail.org/message/4jupw4wnjn3gv7wh

Do note that Lucene in Action 2nd edition is in progress and available  
through Manning's early access program here: http://manning.com/hatcher3/ 
, and updated code is available there (it is coded to Lucene's  
2.9/3.0 API).


Erik

On Dec 16, 2008, at 5:41 AM, Oleg Oltar wrote:


Hi!
I am starting to learn Lucene.
I am using Lucene in Action book for startup (It was recommended to  
me). I
tried to compile first example from that book, but my ide (I use  
eclipse,
shows there are some errors in my code). I am just the beginner  
here, and I
really need to compile at least few programs before I can solve  
problems
myself. So I decided to post here the whole code with my comments.  
Please

help me!!!


package org.main;


import java.io.File;

import java.io.FileReader;

import java.io.IOException;

import java.util.Date;


import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;




public class SimpleIndexer {

/**

* @param args

*/

public static void main(String[] args) throws Exception{

if (args.length !=2){

throw new Exception(Usage: java + SimpleIndexer.class.getName() +  
indexDir

dataDir);

}


File indexDir = new File(args[0]);

File dataDir = new File(args[1]);

long start = new Date().getTime();

int numIndexed = index(indexDir, dataDir);

long end = new Date().getTime();

System.out.println(Indexing  + numIndexed + took  + (end -  
start) +

milliseconds);

}


@SuppressWarnings(deprecation)

public static int index(File indexDir, File dataDir) throws  
IOException {




if (!dataDir.exists() || !dataDir.isDirectory()){

throw new IOException(dataDir +  doesn't exist or not a directory);



}

IndexWriter writer = new IndexWriter(indexDir, new  
StandardAnalyzer(), true);

// Not sure why eclipse crosses this

writer.setUseCompoundFile(false);



indexDirectory(writer, dataDir);

int numIndexed = writer.docCount(); // Not sure why eclipse crosses  
this


writer.optimize();

writer.close();





return numIndexed;

}


private static void indexDirectory(IndexWriter writer, File dir)
throwsIOException{

File[] files = dir.listFiles();

for (int i=0; i files.length; i++){

File f = files[i];

if(f.isDirectory()){

indexDirectory(writer, f);

} else if(f.getName().endsWith(.txt)){

indexFile(writer, f);

}

}

}


private static void indexFile(IndexWriter writer, File f)  
throwsIOException{


if(f.isHidden() || !f.exists() || !f.canRead()){

return;

}

System.out.println(Indexing  + f.getCanonicalPath());

Document doc = new Document();

doc.add(Field.Text(contents, new FileReader(f))); // Eclipse says:  
The

method Text(String, FileReader) is undefined for the type Field

doc.add(Field.Keyword(filename, f.getCanonicalPath())); // Eclipse  
says:The

method Keyword(String, String) is undefined for the type Field

writer.addDocument(doc);

}





}

Please explain me why these errors are shown, and how to fix them.  
Maybe,
the version of lucene used by author of the book, contained needed  
methods?
So may it be that the book is outdated and can't be used for  
learning. If
so, please recommend me something that can help me to start with  
lucene.


Thanks in advance,
Oleg



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: .NET list?

2008-12-12 Thread Erik Hatcher


On Dec 12, 2008, at 9:43 AM, Ian Vink wrote:
I am using java-user@lucene.apache.org  for help, but sometimes I'd  
like
Lucene.net specific help. Is there a mailing list for Lucene.NET on  
apache?


Yes, see the mail list section here: http://incubator.apache.org/lucene.net/ 



Erik



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Marked for deletion

2008-12-01 Thread Erik Hatcher


On Dec 1, 2008, at 3:28 AM, Ganesh wrote:
I need to index voluminous data and i plan to shard it. The client  
may not know which shard db to query. Server will take care of  
complete shard management. I have done almost 50% of  development  
with Lucene.


In case of Solr, i think the client should be aware of which core or  
instance it want to communicate?


See http://wiki.apache.org/solr/DistributedSearch

The example shows a shards parameter being sent from a client, yes  
but all Solr parameters can be either specified from the client or set  
in server-side configuration. So no, a client doesn't need to be aware  
of which shards to query.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Newbie: MatchAllDocsQuery sample?

2008-12-01 Thread Erik Hatcher


On Dec 1, 2008, at 8:30 AM, Ian Vink wrote:

Is there a simple example on how to query for contents:Hello in all
documents using
MatchAllDocsQueryhttp://incubator.apache.org/lucene.net/docs/2.1/Lucene.Net.Search.MatchAllDocsQuery.html 


?
I want 100% of the docs with Hello


You're looking for a TermQuery, not MatchAllDocsQuery.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Marked for deletion

2008-11-25 Thread Erik Hatcher


On Nov 25, 2008, at 5:00 AM, Ganesh wrote:
My index application is a separate process and my search application  
is part of web ui. When User performs delete, i want to do mark for  
deletion.


I think i have no other option other than to update the document,  
but index app is a separate process and it uses index writer. In  
order to update, I am planning to use RMI and create a single  
application which does both index and search and also exposes some  
search and delete methods.


Is there any other way to achieve this?


Perhaps consider using Solr if you're going to wrap Lucene with some  
sort of service layer.  It already takes care of the bulk of the hard  
stuff that you'd end up having to deal with (warming, etc).


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [ot] a reverse lucene

2008-11-23 Thread Erik Hatcher


On Nov 22, 2008, at 10:57 PM, Ian Holsman wrote:

Hi. apologies for the off-topic question.


Not off-topic at all!

I was wondering if anyone knew of a open source solution (or a  
pointer to the algorithms)

that do the reverse of lucene.
By that I mean store a whole lot of queries, and run them against a  
document to see which queries match it. (with a score etc)


I can see the case for this would be a news-article and several  
people writing queries to get alerted if it matched a certain  
condition.


This use-case was the reason MemoryIndex was created.  It's a fast  
single document index where incoming documents could be sent in  
parallel to the main index - and slamming a bunch of queries at it.   
There's also InstantiatedIndex to compare to, as it can handle  
multiple documents.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Filter or Query

2008-11-21 Thread Erik Hatcher


On Nov 20, 2008, at 11:58 PM, Ganesh wrote:

I am planning to use Filter for UserID and Date. I will not be able  
to cache the Filter. I have to create this filter for every request.  
To my knowledge, Filter will give faster results, only if it is  
cached.


Is it a good idea to use a filter or better to use query?


If you use filter queries (fq) in this manner, you will be caching the  
results, and this could have a negative impact on faceting performance  
if you bump facet field/query entries out of the cache.


Having uncached filter queries would be a good idea: https://issues.apache.org/jira/browse/SOLR-407 



But for now, you could simply append your filter to the original  
query string (unless you're using dismax and need a sophisticated  
clause).


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Boosting results

2008-11-11 Thread Erik Hatcher


On Nov 11, 2008, at 8:32 AM, Stefan Trcek wrote:


On Tuesday 11 November 2008 02:18:39 Erik Hatcher wrote:


The integration won't be too painful... the main thing is that Solr
requires* some configuration files, literally on the filesystem, in
order to fire up and be happy.  And you'll need to craft Solr's
schema.xml to jive with how you indexed with pure Lucene.


Thanks Erik, I will give Solr a try. A list of files and classes I  
have

to use or supply to Solr will be appreciated. For now it is
- EmbeddedSolrServer
- SolrQuery
- schema.xml


Yeah, it'll look something like this: http://svn.apache.org/repos/asf/lucene/solr/branches/solr-ruby-refactoring/examples/solrjruby.rb 



That's JRuby code, but is easily translatable into pure Java.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Boosting results

2008-11-10 Thread Erik Hatcher


On Nov 10, 2008, at 2:42 PM, Stefan Trcek wrote:

On Monday 10 November 2008 13:55:31 Michael McCandless wrote:


Finally, you might want to instead look at Solr, which provides facet
counting out of the box, rather than roll your own...


Doooh - new api, but it's facet counting sounds good.

Any starting points for moving from plain lucene to Solr in a smooth
way? I doubt whether it is possible to integrate the facet counting
part of Solr into my plain lucene application?


The integration won't be too painful... the main thing is that Solr  
requires* some configuration files, literally on the filesystem, in  
order to fire up and be happy.  And you'll need to craft Solr's  
schema.xml to jive with how you indexed with pure Lucene.


For searching: Do I have to have a Solr server (servlet engine)  
running

or will EmbeddedSolrServer and SolrQuery do the job?


That'll do the job, without a servlet engine.  But a servlet engine  
can be mighty handy when you need to go to distributed search,  
replication, etc.  But one can use Solr very much like using Lucene,  
API-only (but with config files).



For indexing: Can I use a ready to use lucene index in Solr?


Yup, see above.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: robots.txt

2008-10-20 Thread Erik Hatcher


On Oct 20, 2008, at 8:58 AM, Alexander Aristov wrote:
Just wonder if Nutch takes into consideration rules from the  
robots.txt file

while crawling a site.


Wrong e-mail list, but yeah, Nutch supports robots.txt considerations.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Hiring etiquette

2008-10-19 Thread Erik Hatcher

It's a wiki... create an account and add yourself :)

Erik

On Oct 19, 2008, at 7:10 PM, Cam Bazz wrote:


How can we get on to that list?

Best,

On Mon, Oct 20, 2008 at 1:58 AM, Hasan Diwan [EMAIL PROTECTED]  
wrote:

2008/10/19 Mark Miller [EMAIL PROTECTED]:
You might instead limit your email to those that have agreed to be  
contacted

at http://wiki.apache.org/lucene-java/Support


FWIW, the page indicated is immutable.
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Fwd: CFP open for ApacheCon Europe 2009

2008-10-02 Thread Erik Hatcher



Begin forwarded message:


From: Noirin Shirley [EMAIL PROTECTED]
Date: October 2, 2008 4:22:06 AM EDT
To: [EMAIL PROTECTED]
Subject: CFP open for ApacheCon Europe 2009
Reply-To: [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]

PMCs: Please send this on to your users@ lists!

If you only have thirty seconds:

The Call for Papers for ApacheCon Europe 2009, to be held in  
Amsterdam, from 23rd to 27th March, is now open! Submit your  
proposals at http://eu.apachecon.com/c/aceu2009/cfp/ before 24th  
October.


Remember that early bird prices for ApacheCon US 2008, to be held in  
New Orleans, from 3rd to 7th November, will go up this Friday, at  
midnight Eastern time!


Sponsorship opportunities for ApacheCon US 2008 and ApacheCon EU  
2009 are still available. If you or your company are interested in  
becoming a sponsor, please contact Delia Frees at  
[EMAIL PROTECTED] for details.


***

If you want all the details:

ApacheCon Europe 2009 - Leading the Wave of Open Source
Amsterdam, The Netherlands
23rd to 27th March, 2009

Call for Papers Opens for ApacheCon Europe 2009

The Apache Software Foundation (ASF) invites submissions to its  
official conference, ApacheCon Europe 2009. To be held 23rd to 27th  
March, 2009 at the Mövenpick Hotel Amsterdam City Centre, ApacheCon  
serves as a forum for showcasing the ASF's latest developments,  
including its projects, membership, and community. ApacheCon offers  
unparalleled educational opportunities, with dedicated  
presentations, hands-on trainings, and sessions that address core  
technology, development, business/marketing, and licensing issues in  
Open Source.


ApacheCon's wide range of activities are designed to promote the  
exchange of ideas amongst ASF Members, innovators, developers,  
vendors, and users interested in the future of Open Source  
technology. The conference program includes competitively selected  
presentations, trainings/workshops, and a small number of invited  
speakers. All sessions undergo a peer review process by the  
ApacheCon Conference Planning team. The following information  
provides presentation category descriptions, and information about  
how to submit your

proposal.

Conference Themes and Topics

APACHECON 2009 - LEADING THE WAVE OF OPEN SOURCE

Building on the success of the last two years, we are excited to  
return to Amsterdam in 2009. We'll be continuing to offer our very  
popular two-day trainings, including certifications of completion  
for those who fulfill all the requirements of these trainings.


The ASF comprises some of the most active and recognized developers  
in the Open Source community. By bringing together the pioneers,  
developers, and users of flagship Open Source technologies,  
ApacheCon provides an influential platform for dialogue, between the  
speaker and the audience, between project contributors and the  
community at large, traversing a wide range of ideas, expertise, and  
personalities.


ApacheCon welcomes submissions from like-minded delegates across  
many fields, geographic locations, and areas of development. The  
breadth and loosely-structured nature of the Apache community lends  
itself to conference content that is also somewhat loosely- 
structured. Common themes of interest address groundbreaking  
technologies and emerging trends, successful practices (from  
development to deployment), and lessons learned (tips, tools, and  
tricks). In addition to technical content, ApacheCon invites  
Business Track submissions that address Open Source business,  
marketing, and legal/licensing issues.


Topics appropriate for submission to this conference are manifold,  
and may include but are not restricted to:


- Apache HTTP server topics such as installation, configuration, and  
migration
- ASF-wide projects such as Lucene, SpamAssassin, Jackrabbit, and  
Maven
- Scripting languages and dynamic content such as Java, Perl,  
Python, Ruby, XSL, and PHP

- Security and e-commerce
- Performance tuning, load balancing and high availability
- New technologies and broader initiatives such as Web Services and  
Web 2.0

- ASF-Incubated projects such as Sling, UIMA, and Shindig


Submission Guidelines
Submissions must include
- Title
- Speaker name, with affiliation and email address
- Speaker bio (100 words or less)
- Short description (50 words or less)
- Full description including abstract and objectives (200 words or
less)
- Expertise level (beginner to advanced)
- Format and duration (trainings vs. general presentation; half-,  
full- or two-day workshop, etc.)
- Intended audience and maximum number of participants (trainings  
only)

- Background knowledge expected of the participants (trainings only)


Types of Presentations

- Trainings/Workshops
- General Sessions
- Case Studies/Industry Profiles
- Invited Keynotes/Panels/Speakers
- Corporate Showcases  Demonstrations

BoF sessions and Fast Feather Track talks will be selected separately

Pre Conference 

Re: Calculation of fieldNorm causes irritating effect of sort order

2008-10-02 Thread Erik Hatcher


On Oct 2, 2008, at 7:39 AM, Jimi Hullegård wrote:
Is it possible to disable the lengthNorm calculation for particular  
fields?


Yes, use Field#setOmitNorms(true) when indexing.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



case studies

2008-10-01 Thread Erik Hatcher

Dear Lucene and Solr users -

I'm presenting Lucene/Solr Case Studies at ApacheCon in a month: http://us.apachecon.com/c/acus2008/sessions/41 



I would like to feature implementations by YOU.  The thing is, my  
slides are due this Friday, so time is short to collect this info.  If  
you have a use of either Lucene or Solr that you'd like me to feature  
in the talk, send me (offline please) some details about your  
application, some stats like number of documents, queries per second,  
faceting uses, and so on, and any public pointers I can include.  If  
your system isn't publicly accessible you may also send me a screenshot.


Thanks for your help!

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing sections of TEI XML files

2008-08-13 Thread Erik Hatcher

Have you looked at XTF?   http://www.cdlib.org/inside/projects/xtf/

It does what you're after and much,much more.

Erik


On Aug 13, 2008, at 4:03 AM, [EMAIL PROTECTED] wrote:


Dear users,

Question on approaches to indexing TEI XML or similar section/ 
subsectioned

files.

I'm indexing TEI P4 XML files using Lucene 2.x.

Currently, each TEI XML file corresponds to a Lucene document.
I extract the data from each XML file using XPath expressions e.g.  
for the
body text: /TEI.2/text//p. I also extract and store various meta  
data

e.g. author, title, publishing data etc. per document.

The issue is that TEI documents can be very large and contain several
chapters. Ideally, search terms would return references to chapter(s)
in which the terms were found. The user would then follow a  
hyperlink to a

particular subsection rather than retrieving the entire file.

I think it is possible to transform TEI files into chapterised  
sections

using XSLT although I have not managed this yet. The final system
is likely to use Apache Cocoon to present documents in various  
formats but

that is a separate issue.

I'm tending towards a solution involving indexing each section as a
document (possibly with only the front-matter being associated with  
the

meta data e.g. title) and then maybe using XPointer to associate the
source document.

Any comments/approaches taken to similar issues appreciated.

Thanks,

Aodh Ó Lionáird.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Listing fields in an index

2008-08-13 Thread Erik Hatcher


On Aug 13, 2008, at 5:02 AM, John Patterson wrote:
How do I list all the fields in an index? Some documents do not  
contain all

fields.


Have a look at IndexReader#getFieldNames().  That'll give you back  
field names regardless of which documents have them.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: SpanRegexQuery

2008-08-01 Thread Erik Hatcher


On Jul 31, 2008, at 10:06 PM, Christopher M Collins wrote:
I'm trying to use SpanRegexQuery as one of the clauses in my  
SpanQuery.
When I give it a regex like: L[a-z]+ing and do a rewrite on the  
final
query I get terms like Labinger and Lackonsingh along with the  
expected

terms Labeling, Lacing, etc.  It's as if the regex is treated as a
find() and not a match() in Java.  Is there a way to make it  
behave

like a full match, and not a prefix regex?


There are two implementations of the regex engine built into  
SpanRegexQuery, one using Java's java.util.regex, the other using  
Jakarta Regexp.  The default implementation is java.util.regex, which  
matches like this:


  pattern.matcher(string).lookingAt()

And Jakarta Regexp matches like this:

  regexp.match(string)

I'm not sure myself the differences in these two without doing some  
tests, but certainly they should, ahem, match in at least the  
expectation of whether there is an implied ^string$ or not.  But at a  
quick glance the respective javadocs, it does seem like the  
java.util.regex implementation should be using  
pattern.matcher(string).matches() instead.  lookingAt() always starts  
at the beginning, so there is an implied ^string effect, but not so  
with the akarta Regexp implementation.


As Daniel mentioned, putting a $ at the end should do the trick, and  
seems to me that it really should be necessary... but so should ^ in  
front if you want it to start at the beginning and not match anywhere  
in the string.


Changing JavaUtilRegexCapabilities to use matches() seems like the  
right thing to do, but that'd break backwards compatibility.  *ugh*


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sorting case-insensitively

2008-07-01 Thread Erik Hatcher


On Jun 30, 2008, at 11:08 PM, Paul J. Lucas wrote:


On Jun 30, 2008, at 7:00 PM, Erik Hatcher wrote:


On Jun 30, 2008, at 8:55 PM, Paul J. Lucas wrote:
If I have a SortField with a type of STRING, is there any way to  
sort in a case-insensitive manner?


Only if you unify the case (lower case everything) on the client  
side that you send to Solr, but in general no.


You can use a text field type that uses a KeywordTokenizer(Factory)  
and lowercase on the Solr-side though.  The Solr example schema has  
one such alphaOnlySort field type.


Couldn't I also use a custom SortComparator?


Oops, sorry, my reply was off-base for java-user.  I replied as if,  
obviously, it was solr-user.


And yes, you could use a custom SortComparator for this case.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sorting case-insensitively

2008-06-30 Thread Erik Hatcher


On Jun 30, 2008, at 8:55 PM, Paul J. Lucas wrote:
If I have a SortField with a type of STRING, is there any way to  
sort in a case-insensitive manner?


Only if you unify the case (lower case everything) on the client side  
that you send to Solr, but in general no.


You can use a text field type that uses a KeywordTokenizer(Factory)  
and lowercase on the Solr-side though.  The Solr example schema has  
one such alphaOnlySort field type.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search against an index on a mapped drive ...

2008-03-14 Thread Erik Hatcher


On Mar 14, 2008, at 8:22 AM, Mathieu Lecarme wrote:


Dragon Fly a écrit :

Hi,

I'd like to find out if I can do the following with Lucene (on  
Windows).


On server A:
- An index writer creates/updates the index.  The index is  
physically stored on server A.

- An index searcher searches against the index.

On server B:
- Maps to the index directory.
- An index searcher searches against the index (physically on  
server A).


On server C (same setup as server B):
- Maps to the index directory.
- An index searcher searches against the index (physically on  
server A).


Has anyone done anything similar? Thank you.

With shared drive, you will have SMB latency. Why don't use copy,  
or RMI call?


Or Solr!  :)

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Scoring a query with OR's

2008-03-09 Thread Erik Hatcher


On Mar 9, 2008, at 12:39 PM, Ghinwa Choueiter wrote:
but what exactly happens when there are OR's, for eg.  (life OR  
place OR time)


The scoring equation can get a score for life, place, time  
separately, but what does it do with them then? Does it also add them.


The coord factor kicks in then:

http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/ 
apache/lucene/search/DefaultSimilarity.html#coord(int,%20int)


the formula listed here should help too:

http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/ 
apache/lucene/search/Similarity.html


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Scoring a query with OR's

2008-03-09 Thread Erik Hatcher
With AND, _all_ clauses are required, not just most.   With OR, the  
idea is to reward documents that match more clauses.


Erik


On Mar 9, 2008, at 1:38 PM, Ghinwa Choueiter wrote:
but shouldn't the coord factor kick in with AND instead of OR? I  
understand why you would want to use coord in the case of AND,  
where you reward more the documents that contain most of the terms  
in the query. However in the case of OR, it should not matter if  
all the OR  operands are in the document?


-Ghinwa

- Original Message - From: Erik Hatcher  
[EMAIL PROTECTED]

To: java-user@lucene.apache.org
Sent: Sunday, March 09, 2008 1:22 PM
Subject: Re: Scoring a query with OR's




On Mar 9, 2008, at 12:39 PM, Ghinwa Choueiter wrote:
but what exactly happens when there are OR's, for eg.  (life OR   
place OR time)


The scoring equation can get a score for life, place, time   
separately, but what does it do with them then? Does it also add  
them.


The coord factor kicks in then:

http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc// 
org/ apache/lucene/search/DefaultSimilarity.html#coord(int,%20int)


the formula listed here should help too:

http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc// 
org/ apache/lucene/search/Similarity.html


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Rails and lucene

2008-02-20 Thread Erik Hatcher
And if you're using Solr with Ruby, the solr-ruby gem is the way to  
go (gem install solr-ruby).


And if you're interested in trying out a glorious Rails/Solr hack,  
try out Solr Flare, which presents a rudimentary search/faceted/ 
suggest interface:


Erik


On Feb 20, 2008, at 5:10 AM, Briggs wrote:


I agree with using Solr.  Solr can output ruby code so it can be
immediately evaluated.

http://wiki.apache.org/solr/SolRuby?highlight=% 
28CategoryQueryResponseWriter%29%7C%28%28CategoryQueryResponseWriter 
%29%29


Solr is located at:
http://lucene.apache.org/solr/



On Feb 19, 2008 3:25 PM, Kyle Maxwell [EMAIL PROTECTED] wrote:

Hi guys,
Now an idea knock my brain, which I want to integrate the  
lucene into my
ruby application. And the newest lucene api owns the interface to  
join the
ruby application. UnfortunatelyI have no experience about it. Let  
us talk

about it.


Use Solr, or integrate Lucene via JRuby.  I cannot recommend Ferret.

--
Kyle Maxwell
Software Engineer
CastTV, Inc
http://www.casttv.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






--
Conscious decisions by conscious minds are what make reality real

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Has SpanRegexQuery been deprecated in lucene 2.3.0?

2008-02-12 Thread Erik Hatcher
Erica - it has never been in the core JAR.It should be available  
in the lucene-regex-2.3.0.jar


Erik


On Feb 12, 2008, at 10:01 AM, Mitchell, Erica wrote:


Hi,

I've downloaded lucene 2.3.0 and the jar lucene-core-2.3.0.jar does  
not

contain the SpanRegexQuery class.
Has this been deprecated?

Thanks,
Erica


IONA Technologies PLC (registered in Ireland)
Registered Number: 171387
Registered Address: The IONA Building, Shelbourne Road, Dublin 4,  
Ireland



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problem with Whitespace analyzer

2008-02-10 Thread Erik Hatcher
QueryParser uses special syntax, which can get in the way, for  
operators and grouping, etc.  Parenthesis are part of that special  
syntax, and need to be backslash escaped for QueryParser to skip  
treating them as grouping operators, for example: Ajit_\(Agarkar\)


Erik



On Feb 10, 2008, at 2:03 AM, saikrishna venkata pendyala wrote:


Hi,

I am facing a small problem, some one please help me,

I am using Whitespace analyzer, while both indexing and searching  
the files.


While indexing the analyzer is recognizing tokenAjit_(Agarkar)/ 
token(I

found it using LUKE) as a single token.
But while searching{QueryParser parser = new QueryParser(field,  
analyzer);},
it is divided into two tokens tokenAjit_/token,tokenAgarkar/ 
token.



Enter query:
Ajit_(Agarkar)
Searching for: Ajit_ Agarkar
0 total matching documents




--Saikrishna.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: appending field to an existing index

2008-02-02 Thread Erik Hatcher
One option is to index the new field in sync (same index order) into  
a new index, and search using a ParallelIndexReader.


Erik


On Jan 30, 2008, at 7:42 PM, John Wang wrote:


Hi all:

We have a large index and it is difficult to reindex.

We want to add another field to the index without reindexing,  
e.g. just

create a new inverted index, dictionary files etc.

How feasible is it to add this to lucene?


Thanks

-John



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Update of Lucene in Action ?

2008-01-18 Thread Erik Hatcher


On Jan 18, 2008, at 7:16 AM, thrgroovyboy wrote:

Is the book Lucene In Action updated ?
Or is it the same version based on lucene 1.4 ?


The first, and currently only, edition is based on Lucene 1.4.3, and  
all code works with Lucene 1.9 as well.  Lucene 2.0+ changed some  
API, but it is easy to convert the code to be compatible.


A second edition is in progress at the moment.  Expected completion  
date:  ??  (your guess is as good as mine! :) - but let's say early  
2009 just to be a bit safer than promising this year.  The second  
edition will be current with whatever release of Lucene is available  
at the time it is published - probably 3.0.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: When to use which Analyzer

2008-01-13 Thread Erik Hatcher


On Jan 13, 2008, at 12:08 PM, [EMAIL PROTECTED] wrote:
I have some doubts about Analyzer usage. I read that one shall  
always use

the same analyzer for searching and indexing.
Why? How does the Analyzer effect the search process? What is  
analyzed here

again?


As you surmised, it is because QueryParser analyzes fragments of the  
query string in order to get the query to match the terms indexed.


I can see that when I use the SimpleAnalyzer again, the values of  
my search

are all converted to lowercase and numbers are removed.
This leads to wrong results, because my values are stored with
Field.Index.UN_TOKENIZED.

Why is my query changed this way?

I think it has to do with QueryParsing, which uses an analyzer. Right?

Can I create a query directly, without parsing?


Yes, there are many Query subclasses in Lucene that you can use  
directly.




Or in other words:

How can I search for fields stored with Field.Index.UN_TOKENIZED?


Use TermQuery.



Why do I need an analyzer for searching?


Consider a full-text field that will be tokenized removing special  
characters and lowercased, and then a user querying for an uppercase  
word.   The main thing is that queries need to jive with how things  
got indexed, Analyzer in the mix or not.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: FuzzyQuery - prefixLength - use with QueryParser?

2007-12-17 Thread Erik Hatcher


On Dec 17, 2007, at 3:31 AM, Helmut Jarausch wrote:

FuzzyQuery (in the 2.2.0 API) may take 3 arguments,
term , minimumSimilarity and prefixLength

Is there any syntax to specify the 3rd argument
in a query term for QueryParser?
(I haven't found any the current docs)



No, there isn't.  But you can set it via the API, see  
QueryParser#setFuzzyPrefixLength(int)


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query.rewrite - help me to understand it

2007-12-17 Thread Erik Hatcher


On Dec 17, 2007, at 5:14 AM, qvall wrote:

So does it mean that if I my query doesn't support prefix or wild-char
queries then I don't need to use rewrite() for highlighting?


As long as the terms you want highlighted are extractable from the  
Query instance, all is fine.


However, it wouldn't hurt to always rewrite.  Primitive queries short  
circuit the rewriting anyway, so its not as though you're burning  
much unnecessary time/IO in the rewrite call.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: help required ... ~ operator

2007-12-10 Thread Erik Hatcher


On Dec 10, 2007, at 4:48 AM, Shakti_Sareen wrote:
 I am using StandardAnalyzer() to index the data. I am getting  
false

hits in ~ operator query.

Actual data is: signals by magnets of different strength
and when I am parsing a query: signals strength~2  , I am getting a
hit which is a false result.

I am using QueryParser.

Please help on this issue.


Chances are that you've got a stop word remover in the mix, and by  
and of are being removed, thus making the words close enough for a  
match.  The built in stop filter does not leave gaps for removed  
words.  So you could either use a custom stop filter or remove it  
altogether to keep those words there.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Can I search in realtime?

2007-12-07 Thread Erik Hatcher


On Dec 6, 2007, at 9:02 PM, 游泳池的鱼 wrote:
Hi, it's my first time to use lucene maillist. I have problem that   
when  I

add a document with IndexWriter , it searchable for the IndexSearcher
instance which is creat before the document flush to index? if  
lucene can

not do this,any suggest to solve this problem?


Yes... recreate a new IndexSearcher after adding documents :)
That's the Lucene Way.


Erik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  1   2   3   4   5   6   7   8   >