Hi Michael,

I just find my problem. Du to a Lucene problem that it index “abcd.” Like a 
word we added into our code a regex to add space between “abcd” and “.” (or 
punctuation caracters).

So I update this regex and it wxorks fine.

The code before:
// Add space between word and punctuation caracters
String pattern = "(\\w)([\\.,;\\?!:])";
contents = contents.replaceAll(pattern, "$1 $2");

The code after:
// Not taking into account the figures if the amounts will be cut
            // REGEX: all words ([a-zA-Z0-9]) followed by,;.? but not 
immediately followed by punctuation
            String pattern = "(\\w)([\\.,;\\?!:])(?!(\\s*[0-9]))";
            contents = contents.replaceAll(pattern, "$1 $2");

Thanks a lot for your time.

Good bye

Jérémy GUYENOT | Responsable service R&D
jguye...@efalia.com<mailto:jguye...@efalia.com>
———————————
49, av. de la République 69200 Vénissieux | Tél : 04 72 51 77 55 | Fax : 04 72 
50 43 13
WWW.EFALIA.COM<http://www.efalia.com/>
[cid:image003.jpg@01D0B342.03D15BD0]<http://www.efalia.com/>
Pour assurer un suivi technique de vos demandes veuillez passer par 
Mantis<http://feqa.communauteged-multigest.fr/> notre outil en ligne.

P Eco-responsabilité, n'imprimez ce mail que si nécessaire

De : Michael McCandless [mailto:luc...@mikemccandless.com]
Envoyé : mardi 27 septembre 2016 16:19
À : Lucene Users <java-user@lucene.apache.org>; Jérémy GUYENOT 
<jguye...@efalia.com>
Cc : Jan Høydahl <jan....@cominvent.com>
Objet : Re: Research problems on numeric values into text (with. or,)

Possibly you are using an analyzer that does not preserve decimal numbers as a 
single token?  Or, you are using a different analyzer at indexing time vs 
search time?

Can you make a small test case showing the issue?

Mike McCandless

http://blog.mikemccandless.com

On Tue, Sep 27, 2016 at 3:06 PM, Jérémy GUYENOT 
<jguye...@efalia.com<mailto:jguye...@efalia.com>> wrote:
Hello,

Sorry for this multi post but my first post was without answers so I try 
another way.

What are you indexing?
I wish to index files such as that present in the "ZIP \ file" folder, which 
contains decimal data (with. Or, as decimal separator).

How are you searching, and what did you expect to find?
I want to be able to search decimals because our tools stock large quantities 
of such documents (eg invoices, quotes, orders).

What do you actually see and why is that a problem?
The search for the number 404 returns files containing 404.
The search for the number 50 returns files containing 50.
The search for the number 404.50 returns no results.
The text content was store in a TextField with 
Field.Store.NO<http://Field.Store.NO>.
I try some of Analysers but the result is the same. I also try with 4.3.1 and 
6.2.0 of lucene but the same.

I wish you can give me some details to search decimals values into text files.

In the zip you can find:

-       File

o   The file example containing decimals values

-       Index

o   The files of Lucene indexation

-       Indexationlucene

o   The code that we have to index file from our app

-       RechercheLucene

o   The code that we have to search into our app

Cordially

Jérémy GUYENOT | Responsable service R&D
jguye...@efalia.com<mailto:jguye...@efalia.com>
———————————
49, av. de la République 69200 Vénissieux | Tél : 04 72 51 77 55 | Fax : 04 72 
50 43 13
WWW.EFALIA.COM<http://www.efalia.com/>
[cid:image003.jpg@01D0B342.03D15BD0]<http://www.efalia.com/>
Pour assurer un suivi technique de vos demandes veuillez passer par 
Mantis<http://feqa.communauteged-multigest.fr/> notre outil en ligne.

P Eco-responsabilité, n'imprimez ce mail que si nécessaire

De : Jan Høydahl [mailto:jan....@cominvent.com<mailto:jan....@cominvent.com>]
Envoyé : mardi 27 septembre 2016 10:20
À : java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>
Cc : Jérémy GUYENOT <jguye...@efalia.com<mailto:jguye...@efalia.com>>
Objet : Re: Research problems on numeric values into text (with. or,)

Please do not cross-post to multiple mailing lists.
This belongs to java-user only.
It is also generally better to describe the problem in more detail in the mail, 
than attaching a zip.
- What are you indexing
- How are you searching, and what did you expect to find
- What do you actually see and why is that a problem?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com<http://www.cominvent.com>

27. sep. 2016 kl. 10.15 skrev Jérémy GUYENOT 
<jguye...@efalia.com<mailto:jguye...@efalia.com>>:

Hello,
we find research problems on numeric values into text (with. or,). Unable to 
search 315.86 or 315.86.
We try custom Analysers without success either.
I enclose the code used to index and one to do the research.
I do not know if this is a bug on your side or problem Analyze of ours.
The problem is the same between version 4.3.1 and 6.2.0.
Thank you in advance for your quick return.
cordially
<LUCENE.zip>
---------------------------------------------------------------------
To unsubscribe, e-mail: 
java-user-unsubscr...@lucene.apache.org<mailto:java-user-unsubscr...@lucene.apache.org>
For additional commands, e-mail: 
java-user-h...@lucene.apache.org<mailto:java-user-h...@lucene.apache.org>



---------------------------------------------------------------------
To unsubscribe, e-mail: 
java-user-unsubscr...@lucene.apache.org<mailto:java-user-unsubscr...@lucene.apache.org>
For additional commands, e-mail: 
java-user-h...@lucene.apache.org<mailto:java-user-h...@lucene.apache.org>

Reply via email to