Re: [jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

J. Delgado Wed, 08 Aug 2007 10:54:29 -0700

Michael, are you still working on this replacement of the BLOB I/O?

I'm looking into parameterizing the option of lazy syncs of DML
operations (via calls to LuceneDomainIndex.sync potentially queued
using dbms_aq) which is convenient for bulk inserts vs. real-time
syncs for non-bulked operations for transactional data retrieval.


-- Joaquin

2007/7/12, Michael Goddard (JIRA) <[EMAIL PROTECTED]>:
>
>     [ 
> https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512169
>  ]
>
> Michael Goddard commented on LUCENE-724:
> ----------------------------------------
>
> Marcelo,
>
> Are you still working on this?  I have been experimenting with it recently -- 
> thank you for creating it.  Do you think that the I/O might be faster if the 
> Vector was replaced with BLOB I/O via InputStream, OutputStream directly?  
> That is what I am working with right now, and I did observe my indexing time 
> for a sample data set go from 22 seconds to 13 seconds.  I do currently have 
> the problem that the resulting index is not behaving correctly and am working 
> on that.
>
>
> > Oracle JVM implementation for Lucene DataStore also a preliminary 
> > implementation for an Oracle Domain index using Lucene
> > ------------------------------------------------------------------------------------------------------------------------
> >
> >                 Key: LUCENE-724
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-724
> >             Project: Lucene - Java
> >          Issue Type: New Feature
> >          Components: Store
> >    Affects Versions: 2.0.0
> >         Environment: Oracle 10g R2 with latest patchset, there is a txt 
> > file into the lib directory with the required libraries to compile this 
> > extension, which for legal issues I can't redistribute. All these libraries 
> > are include into the Oracle home directory,
> >            Reporter: Marcelo F. Ochoa
> >            Priority: Minor
> >         Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, 
> > ojvm-12-20-06.tar.gz, ojvm.tar.gz
> >
> >
> > Here a preliminary implementation of the Oracle JVM Directory data store 
> > which replace a file system by BLOB data storage.
> > The reason to do this is:
> >   - Using traditional File System for storing the inverted index is not a 
> > good option for some users.
> >   - Using BLOB for storing the inverted index running Lucene outside the 
> > Oracle database has a bad performance because there are a lot of network 
> > round trips and data marshalling.
> >   - Indexing relational data stores such as tables with VARCHAR2, CLOB or 
> > XMLType with Lucene running outside the database has the same problem as 
> > the previous point.
> >   - The JVM included inside the Oracle database can scale up to 10.000+ 
> > concurrent threads without memory leaks or deadlock and all the operation 
> > on tables are in the same memory space!!
> >   With these points in mind, I uploaded the complete Lucene framework 
> > inside the Oracle JVM and I runned the complete JUnit test case successful, 
> > except for some test such as the RMI test which requires special grants to 
> > open ports inside the database.
> >   The Lucene's test cases run faster inside the Oracle database (11g) than 
> > the Sun JDK 1.5, because the classes are automatically JITed after some 
> > executions.
> >   I had implemented and OJVMDirectory Lucene Store which replaces the file 
> > system storage with a BLOB based storage, compared with a RAMDirectory 
> > implementation is a bit slower but we gets all the benefits of the BLOB 
> > storage (backup, concurrence control, and so on).
> >  The OJVMDirectory is cloned from the source at
> > http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some 
> > changes to run faster inside the Oracle JVM.
> >  At this moment, I am working in a full integration with the SQL Engine 
> > using the Data Cartridge API, it means using Lucene as a new Oracle Domain 
> > Index.
> >  With this extension we can create a Lucene Inverted index in a table using:
> > create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
> >  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or 
> > XMLType, after this, the query against the Lucene inverted index can be 
> > made using a new Oracle operator:
> > select * from t1 where contains(f2, 'Marcelo') = 1;
> >  the important point here is that this query is integrated with the 
> > execution plan of the Oracle database, so in this simple example the Oracle 
> > optimizer see that the column "f2" is indexed with the Lucene Domain index, 
> > then using the Data Cartridge API a Java code running inside the Oracle JVM 
> > is executed to open the search, a fetch all the ROWID that match with 
> > "Marcelo" and get the rows using the pointer,
> > here the output:
> > SELECT STATEMENT                                      ALL_ROWS      3       
> > 1       115
> >        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
> >             DOMAIN INDEX LUCENE.IT1
> >  Another benefits of using the Data Cartridge API is that if the table T1 
> > has insert, update or delete rows operations a corresponding Java method 
> > will be called to automatically update the Lucene Index.
> >   There is a simple HTML file with some explanation of the code.
> >    The install.sql script is not fully tested and must be lunched into the 
> > Oracle database, not remotely.
> >   Best regards, Marcelo.
> > - For Oracle users the big question is, Why do I use Lucene instead of 
> > Oracle Text which is implemented in C?
> >   I think that the answer is too simple, Lucene is open source and anybody 
> > can extend it and add the functionality needed
> > - For Lucene users which try to use Lucene as enterprise search engine, the 
> > Oracle JVM provides an highly scalable container which can scale up to 
> > 10.000+ concurrent session and with the facility of querying table in the 
> > same memory space.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Reply via email to