Unsubscribe

 


    On Tuesday, November 10, 2015 12:37 PM, 
"clucene-developers-requ...@lists.sourceforge.net" 
<clucene-developers-requ...@lists.sourceforge.net> wrote:
 

 Send CLucene-developers mailing list submissions to
    clucene-developers@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
    https://lists.sourceforge.net/lists/listinfo/clucene-developers
or, via email, send a message with subject or body 'help' to
    clucene-developers-requ...@lists.sourceforge.net

You can reach the person managing the list at
    clucene-developers-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of CLucene-developers digest..."


Today's Topics:

  1. CLucene index query fails with 5GB of data (Shailesh Birari)
  2. Performing case insensitive searches ? (norbert barichard)
  3. Re: Performing case insensitive searches ? (cel tix44)
  4. Indexing fails with ..    FIELDS_INDEX_EXTENSION).c_str() )'
      failed (Akash)
  5. 'More Like This' feature in clucene (Abhay Rawat)


----------------------------------------------------------------------

Message: 1
Date: Tue, 24 Mar 2015 11:26:15 +1300
From: Shailesh Birari <sbirar...@gmail.com>
Subject: [CLucene-dev] CLucene index query fails with 5GB of data
To: clucene-developers@lists.sourceforge.net,     Shailesh Birari
    <sbirar...@gmail.com>
Message-ID:
    <CAE8-Fr=3-j4xnvpgal9r05wtarykosq8xijasc5n4uejbde...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

I am observing a strange behavior of CLucene with large data (though its
not that large).

I have 40,000 HTML documents (around 5GB of data). I added these documents
in Lucene Index. When I try to search a word with this index it gives me
zero results.

If I take subset of these documents (only 170 documents) and create a Index
then the same search works.

Note, to create above both Index I used the same the same code.

Here is what I am doing, to add an string in index. (Note I am passing the
document contents as string).

void LuceneLib::AddStringToDoc(Document *doc, const char *fieldName, const
char *str)
{
wchar_t *wstr = charToWChar(fieldName);
wchar_t *wstr2 = charToWChar(str);

bool isHighlighted = false;
bool isStoreCompressed = false;

for (int i =0; i < highlightedFields.size(); i++)
{
if (highlightedFields.at(i).compare(fieldName) == 0) {
isHighlighted = true;
break;
}
}

for (int i =0; i < compressedFields.size(); i++)
{
if (compressedFields.at(i).compare(fieldName) == 0) {
isStoreCompressed = true;
break;
}
}

cout << "Field : " << fieldName << " ";
int fieldConfig = Field::INDEX_TOKENIZED;

if (isHighlighted == true) {
fieldConfig = fieldConfig | Field::TERMVECTOR_WITH_POSITIONS_OFFSETS;
cout << " Highlighted";
}

if (isStoreCompressed == true) {
fieldConfig = fieldConfig | Field::STORE_COMPRESS;
cout << " Store Compressed";
}
else {
fieldConfig = fieldConfig | Field::STORE_NO;
cout << " Do not store";
}
cout << " : " << fieldConfig << endl;

Field *field = _CLNEW Field((const TCHAR *) wstr, (const TCHAR *) wstr2,
fieldConfig);
doc->add(*field);

delete[] wstr;
delete[] wstr2;
}


I checked the field config values and those are as below:
Field : docName  Do not store : 34
Field : docPath  Do not store : 34
Field : docContent  Highlighted Store Compressed : 3620
Field : All  Do not store : 34


The field on which I am doing a query is docContent.

Please let me know if I have missed anything.

Thanks,
  Shailesh
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 2
Date: Wed, 25 Mar 2015 13:51:15 +0100
From: norbert barichard <norbert.barich...@diginext.fr>
Subject: [CLucene-dev] Performing case insensitive searches ?
To: clucene-developers@lists.sourceforge.net
Message-ID: <5512af43.80...@diginext.fr>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hello,

Is there a way to tell CLucene to be Case Insensitive when performing a 
search ? It's a bit annoying that when I do a search, I don't get any 
results if I don't get all the upper case letters right.

Thanks in advance !




------------------------------

Message: 3
Date: Wed, 1 Apr 2015 08:38:14 +1100
From: cel tix44 <celti...@gmail.com>
Subject: Re: [CLucene-dev] Performing case insensitive searches ?
To: clucene-developers@lists.sourceforge.net
Message-ID:
    <caalxmkvg3+htoshihf75zk4ysw3evkxnyb8jdq-ddmpmnwj...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Norbert, I guess you need to check the analyzer you're using to create your
indexes, as well as the analyzer you use for searches. You probably need to
use an analyzer (both for indexing and searching) that uses LowCaseFilter.

Off the top of my head ... check if StandardAnalyzer (both for indexing and
searching) does what you want.

To get a better explanation, google for: lucene case insensitive search

>From what you'll find for Java Lucene -- you'll get an idea of the way to
go.

To inspect the contents of your index, you can use Luke (google for: luke
lucene) -- you'll see straight away if your index has case-sensitive terms.

Regards
Celto

On Wed, Mar 25, 2015 at 11:51 PM, norbert barichard <
norbert.barich...@diginext.fr> wrote:

> Hello,
>
> Is there a way to tell CLucene to be Case Insensitive when performing a
> search ? It's a bit annoying that when I do a search, I don't get any
> results if I don't get all the upper case letters right.
>
> Thanks in advance !
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 4
Date: Wed, 14 Oct 2015 02:27:56 +0530
From: Akash <akbwiz+cluc...@gmail.com>
Subject: [CLucene-dev] Indexing fails with ..
    FIELDS_INDEX_EXTENSION).c_str() )' failed
To: clucene-developers@lists.sourceforge.net
Message-ID: <8e911626ff9166d5b8f3ab4db49f6...@mailjol.in>
Content-Type: text/plain; charset=US-ASCII; format=flowed

Hi,

I am using Dovecot with its clucene plugin for indexing. I am hitting a 
error while trying to index a large folder of emails. Sometimes it 
throws this error after 30000 emails, sometimes 40000, the latest it 
gave up after 111000. But it just never completes. On Dovecot list, I 
was told that its probably CLucene library bug which they can't do much 
about & I was suggested to switch to solr (which I don't want to). Can 
there be a fix for this:

111000/322080 doveadm:
/home/stephan/packages/wheezy/i386/clucene-core-2.3.3.4/src/core/CLucene/index/DocumentsWriter.cpp:210:
std:tring lucene::index:ocumentsWriter::closeDocStore(): Assertion
`numDocsInStore*8 == directory->fileLength( (docStoreSegment + "." +
IndexFileNames::FIELDS_INDEX_EXTENSION).c_str() )' failed.
Aborted

I am using dovecot 2:2.2.19-1~auto+7& libclucene-core1:i386 2.3.3.4-4 
from debian wheezy backports. Please advice.

-Akash



------------------------------

Message: 5
Date: Tue, 10 Nov 2015 10:37:27 +0000
From: Abhay Rawat <abhay.ra...@hornbill.com>
Subject: [CLucene-dev] 'More Like This' feature in clucene
To: "clucene-developers@lists.sourceforge.net"
    <clucene-developers@lists.sourceforge.net>
Message-ID:
    
<bf9b50625739a8499563c190deb4e98d50e6c3b...@hcl-exch2k7.internal.hornbill.com>
    
Content-Type: text/plain; charset="us-ascii"

Hello,

Currently java lucene has this functionality called "More Like This"
Which is used to find representative terms of a document which can be further 
used to search for similar documents.
I looked in latest clucene code but could not find this functionality.

Is it there in clucene? If not then are there any plans to include it?
Or  if someone has done some work on this or area similar to this, It will be 
great to hear from them.

Thanks
Abhay

________________________________
****************************************

IMPORTANT INFORMATION
The information contained in this email or any of its attachments is 
confidential and is intended for the exclusive use of the individual or entity 
to whom it is addressed. It may not be disclosed to, copied, distributed or 
used by anyone else without our express permission. If you receive this 
communication in error please advise the sender immediately and delete it from 
your systems. This email is not intended to and does not create legally binding 
commitments or obligations on behalf of Hornbill Service Management Limited 
which may only be created by hard copy writing signed by a director or other 
authorized officer. Any opinions, conclusions and other information in this 
message that do not relate to the official business of Hornbill Service 
Management Limited are unauthorized and neither given nor endorsed by it. 
Although Anti-Virus measures are used by Hornbill Service Management Limited it 
is the responsibility of the addressee to scan this email and any attachments 
for computer viruses or other defects. Hornbill Service Management Limited does 
not accept any liability for any loss or damage of any nature, however caused, 
which may result directly or indirectly from this email or any file attached.

Hornbill Service Management Limited. Registered Office: Apollo, Odyssey 
Business Park, West End Road, Ruislip, HA4 6QD, United Kingdom. Registered in 
England Number: 3033585.

****************************************
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------


------------------------------

_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers


End of CLucene-developers Digest, Vol 90, Issue 1
*************************************************


  
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to