Hi all,
I'm trying to index some Russian content and search in this content using
the CLucene library (v2.3.3.4-10). It works most of the time, but on some
words the wildcard query is not working for me, and I have no idea why.
Can anybody help me on this, please?
Here is my source code:
*main.cc:*
#include <QCoreApplication>
#include <QString>
#include <QDebug>
#include <QScopedPointer>
#include <CLucene.h>
const TCHAR FIELD_CONTENT[] = L"content";
const char INDEX_PATH[] = "/tmp/index";
void create_index(const QString &content)
{
lucene::analysis::standard::StandardAnalyzer analyzer;
lucene::index::IndexWriter writer(INDEX_PATH, &analyzer, true);
lucene::document::Document doc;
std::wstring content_buffer = content.toStdWString();
doc.add(*_CLNEW lucene::document::Field(FIELD_CONTENT,
content_buffer.data(),
lucene::document::Field::STORE_NO |
lucene::document::Field::INDEX_TOKENIZED |
lucene::document::Field::TERMVECTOR_NO,
true));
writer.addDocument(&doc);
writer.flush();
writer.close(true);
}
void search(const QString &query_string)
{
lucene::search::IndexSearcher searcher(INDEX_PATH);
lucene::analysis::standard::StandardAnalyzer analyzer;
lucene::queryParser::QueryParser parser(FIELD_CONTENT, &analyzer);
parser.setAllowLeadingWildcard(true);
std::wstring query = query_string.toStdWString();
QScopedPointer< lucene::search::Query >
lucene_query(parser.parse(query.c_str(), FIELD_CONTENT, &analyzer));
QScopedPointer< lucene::search::Hits >
hits(searcher.search(lucene_query.data()));
TCHAR *query_debug_string(lucene_query->toString());
qDebug() << "found?" << query_string <<
QString::fromWCharArray(query_debug_string) << (hits->length() > 0);
free(query_debug_string);
}
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
create_index(QString("Росси́я официально также Росси́йская Федера́ция"));
search(QString("noWordLkeThis")); // ok
search(QString("Федера́ция")); // ok
search(QString("Федер*ция")); // ERROR: it should work, but it doesn't
search(QString("Фед*")); // ok
search(QString("Федер")); // ok
search(QString("\"федера ция\"")); // why is this working?
search(QString("официально")); // ok
search(QString("офиц*ьно")); // ok
search(QString("оф*циально")); // ok
search(QString("офици*но")); // ok
return 0;
}
*cluceneutf8.pro <http://cluceneutf8.pro>:*
QT -= gui
CONFIG += c++11 console
CONFIG -= app_bundle
CONFIG += link_pkgconfig
PKGCONFIG += libclucene-core
SOURCES += \
main.cc
qmake && make && ./cluceneutf8
*The output of the program:*
found? "noWordLkeThis" "content:nowordlkethis" false
found? "Федера́ция" "content:\"федера ция\"" true
found? "Федер*ция" "content:федер*ция" false
found? "Фед*" "content:фед*" true
found? "Федер" "content:федер" false
found? "\"федера ция\"" "content:\"федера ция\"" true
found? "официально" "content:официально" true
found? "офиц*ьно" "content:офиц*ьно" true
found? "оф*циально" "content:оф*циально" true
found? "офици*но" "content:офици*но" true
It's built with Qt and qmake, but I also made a non-Qt version if that
would be better to share, I can.
So my problem is that I can search for *Федера́ция* but I can't search for
*Федер*ция* for example. Other words like *официально* can be searched
anyway.
Thanks.
--
Dömők Tamás
_______________________________________________
CLucene-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/clucene-developers