Hi , multivalue dimensions will work only in some generic simple case, for
example where logs have simple form with space separated words. But even with
this form of data, it need external preprocessing, which will be grow with
time. For example by first it just split by space, when we realize we also want
to split by all special characters, when we realize what we also want to search
by part of word, so we k skip n gramm, etc. With what external preprocessing
will slowly move to things, what lucene doing. Also with what we cant simply
get source text, for like select * from table limit 100, because data in
multivalue column splitted and optimized for search. So this requiere
denormalization of data and cost additional space.
Simple lucene indexing looks like this :
```java
Analyzer analyzer = new StandardAnalyzer();
// Store the index in memory:
Directory directory = new RAMDirectory();
// To store an index on disk, use this instead:
//Directory directory = FSDirectory.open("/tmp/testindex");
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
iwriter.addDocument(doc);
iwriter.close();
// Now search the index:
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "text":
QueryParser parser = new QueryParser("fieldname", analyzer);
Query query = parser.parse("text");
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
assertEquals(1, hits.length);
// Iterate through the results:
for (int i = 0; i < hits.length; i++) {
Document hitDoc = isearcher.doc(hits[i].doc);
assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
}
ireader.close();
directory.close();
```
i think adding it as new column will be great. The main reason is what lucene
is more heavy than simple token indexing. Mixing disabled indexing, tokening
and lucene in one table can greatly reduce total amount of required disk space
compare to full lucene indexing
[ Full content available at:
https://github.com/apache/incubator-druid/issues/6189 ]
This message was relayed via gitbox.apache.org for [email protected]