Michael, are you still working on this replacement of the BLOB I/O? I'm looking into parameterizing the option of lazy syncs of DML operations (via calls to LuceneDomainIndex.sync potentially queued using dbms_aq) which is convenient for bulk inserts vs. real-time syncs for non-bulked operations for transactional data retrieval.
-- Joaquin 2007/7/12, Michael Goddard (JIRA) <[EMAIL PROTECTED]>: > > [ > https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512169 > ] > > Michael Goddard commented on LUCENE-724: > ---------------------------------------- > > Marcelo, > > Are you still working on this? I have been experimenting with it recently -- > thank you for creating it. Do you think that the I/O might be faster if the > Vector was replaced with BLOB I/O via InputStream, OutputStream directly? > That is what I am working with right now, and I did observe my indexing time > for a sample data set go from 22 seconds to 13 seconds. I do currently have > the problem that the resulting index is not behaving correctly and am working > on that. > > > > Oracle JVM implementation for Lucene DataStore also a preliminary > > implementation for an Oracle Domain index using Lucene > > ------------------------------------------------------------------------------------------------------------------------ > > > > Key: LUCENE-724 > > URL: https://issues.apache.org/jira/browse/LUCENE-724 > > Project: Lucene - Java > > Issue Type: New Feature > > Components: Store > > Affects Versions: 2.0.0 > > Environment: Oracle 10g R2 with latest patchset, there is a txt > > file into the lib directory with the required libraries to compile this > > extension, which for legal issues I can't redistribute. All these libraries > > are include into the Oracle home directory, > > Reporter: Marcelo F. Ochoa > > Priority: Minor > > Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, > > ojvm-12-20-06.tar.gz, ojvm.tar.gz > > > > > > Here a preliminary implementation of the Oracle JVM Directory data store > > which replace a file system by BLOB data storage. > > The reason to do this is: > > - Using traditional File System for storing the inverted index is not a > > good option for some users. > > - Using BLOB for storing the inverted index running Lucene outside the > > Oracle database has a bad performance because there are a lot of network > > round trips and data marshalling. > > - Indexing relational data stores such as tables with VARCHAR2, CLOB or > > XMLType with Lucene running outside the database has the same problem as > > the previous point. > > - The JVM included inside the Oracle database can scale up to 10.000+ > > concurrent threads without memory leaks or deadlock and all the operation > > on tables are in the same memory space!! > > With these points in mind, I uploaded the complete Lucene framework > > inside the Oracle JVM and I runned the complete JUnit test case successful, > > except for some test such as the RMI test which requires special grants to > > open ports inside the database. > > The Lucene's test cases run faster inside the Oracle database (11g) than > > the Sun JDK 1.5, because the classes are automatically JITed after some > > executions. > > I had implemented and OJVMDirectory Lucene Store which replaces the file > > system storage with a BLOB based storage, compared with a RAMDirectory > > implementation is a bit slower but we gets all the benefits of the BLOB > > storage (backup, concurrence control, and so on). > > The OJVMDirectory is cloned from the source at > > http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some > > changes to run faster inside the Oracle JVM. > > At this moment, I am working in a full integration with the SQL Engine > > using the Data Cartridge API, it means using Lucene as a new Oracle Domain > > Index. > > With this extension we can create a Lucene Inverted index in a table using: > > create index it1 on t1(f2) indextype is LuceneIndex parameters('test'); > > assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or > > XMLType, after this, the query against the Lucene inverted index can be > > made using a new Oracle operator: > > select * from t1 where contains(f2, 'Marcelo') = 1; > > the important point here is that this query is integrated with the > > execution plan of the Oracle database, so in this simple example the Oracle > > optimizer see that the column "f2" is indexed with the Lucene Domain index, > > then using the Data Cartridge API a Java code running inside the Oracle JVM > > is executed to open the search, a fetch all the ROWID that match with > > "Marcelo" and get the rows using the pointer, > > here the output: > > SELECT STATEMENT ALL_ROWS 3 > > 1 115 > > TABLE ACCESS(BY INDEX ROWID) LUCENE.T1 3 1 115 > > DOMAIN INDEX LUCENE.IT1 > > Another benefits of using the Data Cartridge API is that if the table T1 > > has insert, update or delete rows operations a corresponding Java method > > will be called to automatically update the Lucene Index. > > There is a simple HTML file with some explanation of the code. > > The install.sql script is not fully tested and must be lunched into the > > Oracle database, not remotely. > > Best regards, Marcelo. > > - For Oracle users the big question is, Why do I use Lucene instead of > > Oracle Text which is implemented in C? > > I think that the answer is too simple, Lucene is open source and anybody > > can extend it and add the functionality needed > > - For Lucene users which try to use Lucene as enterprise search engine, the > > Oracle JVM provides an highly scalable container which can scale up to > > 10.000+ concurrent session and with the facility of querying table in the > > same memory space. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]