[ 
https://issues.apache.org/jira/browse/LUCENE-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990538#comment-12990538
 ] 

Renaud Delbru edited comment on LUCENE-2886 at 2/4/11 12:05 PM:
----------------------------------------------------------------

Just an additional comment on semi-structured data indexing. AFOR-2 and AFOR-3 
(AFOR-3 refers to AFOR-2 with special code for allOnes frames), was able to 
beat Rice on two datasets, and S-64 on one (but it was very close to Rice on 
the others):

DBpedia dataset: (structured version of wikipedia)

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|0.246|0.043|0.141|0.065|0.180|0.816|
|AFOR-2|0.229|0.039|0.132|0.059|0.167|0.758|
|AFOR-3|0.229|0.031|0.131|0.054|0.159|0.736|
|FOR|0.315|0.061|0.170|0.117|0.216|1.049|
|PFOR|0.317|0.044|0.155|0.070|0.205|0.946|
|Rice|0.240|0.029|0.115|0.057|0.152|0.708|
|S-64|0.249|0.041|0.133|0.062|0.171|0.791|
|VByte|0.264|0.162|0.222|0.222|0.245|1.335|

Geonames Dataset: 

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|0.129|0.023|0.058|0.025|0.025|0.318|
|AFOR-2|0.123|0.023|0.057|0.024|0.024|0.307|
|AFOR-3|0.114|0.006|0.056|0.016|0.008|0.256|
|FOR|0.150|0.021|0.065|0.025|0.023|0.349|
|PFOR|0.154|0.019|0.057|0.022|0.023|0.332|
|Rice|0.133|0.019|0.063|0.029|0.021|0.327|
|S-64|0.147|0.021|0.058|0.023|0.023|0.329|
|VByte|0.216|0.142|0.143|0.143|0.143|0.929|

Sindice Dataset: Very heterogeneous dataset containing hundred of thousands of 
web dataset

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|2.578|0.395|0.942|0.665|1.014|6.537|
|AFOR-2|2.361|0.380|0.908|0.619|0.906|6.082|
|AFOR-3|2.297|0.176|0.876|0.530|0.722|5.475|
|FOR|3.506|0.506|1.121|0.916|1.440|8.611|
|PFOR|3.221|0.374|1.153|0.795|1.227|7.924|
|Rice|2.721|0.314|0.958|0.714|0.941|6.605|
|S-64|2.581|0.370|0.917|0.621|0.908|6.313|
|VByte|3.287|2.106|2.411|2.430|2.488|15.132|

Here, Ent refers to entity id (similar to doc id), Att and Val are structural 
node ids.

      was (Author: renaud.delbru):
    Just an additional comment on semi-structured data indexing. AFOR-2 and 
AFOR-3 (AFOR-3 refers to AFOR-2 with special code for allOnes frames), was able 
to beat Rice on two datasets, and S-64 on one (but it was very close to Rice on 
the others):

DBpedia dataset: (structured version of wikipedia)

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|0.246|0.043|0.141|0.065|0.180|0.816|
|AFOR-2|0.229|0.039|0.132|0.059|0.167|0.758|
|AFOR-3|0.229|0.031|0.131|0.054|0.159|0.736|
|FOR|0.315|0.061|0.170|0.117|0.216|1.049|
|PFOR|0.317|0.044|0.155|0.070|0.205|0.946|
|Rice|0.240|0.029|0.115|0.057|0.152|0.708|
|S-64|0.249|0.041|0.133|0.062|0.171|0.791|
|VByte|0.264|0.162|0.222|0.222|0.245|1.335|

Geonames Dataset: 

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|0.129|0.023|0.058|0.025|0.025|0.318|
|AFOR-2|0.123|0.023|0.057|0.024|0.024|0.307|
|AFOR-3|0.114|0.006|0.056|0.016|0.008|0.256|
|FOR|0.150|0.021|0.065|0.025|0.023|0.349|
|PFOR|0.154|0.019|0.057|0.022|0.023|0.332|
|Rice|0.133|0.019|0.063|0.029|0.021|0.327|
|S-64|0.147|0.021|0.058|0.023|0.023|0.329|
|VByte|0.264|0.162|0.222|0.222|0.245|1.335|

Sindice Dataset: Very heterogeneous dataset containing hundred of thousands of 
web dataset

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|2.578|0.395|0.942|0.665|1.014|6.537|
|AFOR-2|2.361|0.380|0.908|0.619|0.906|6.082|
|AFOR-3|2.297|0.176|0.876|0.530|0.722|5.475|
|FOR|3.506|0.506|1.121|0.916|1.440|8.611|
|PFOR|3.221|0.374|1.153|0.795|1.227|7.924|
|Rice|2.721|0.314|0.958|0.714|0.941|6.605|
|S-64|2.581|0.370|0.917|0.621|0.908|6.313|
|VByte|3.287|2.106|2.411|2.430|2.488|15.132|

Here, Ent refers to entity id (similar to doc id), Att and Val are structural 
node ids.
  
> Adaptive Frame Of Reference 
> ----------------------------
>
>                 Key: LUCENE-2886
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2886
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Codecs
>            Reporter: Renaud Delbru
>             Fix For: 4.0
>
>         Attachments: LUCENE-2886_simple64.patch, 
> LUCENE-2886_simple64_varint.patch, lucene-afor.tar.gz
>
>
> We could test the implementation of the Adaptive Frame Of Reference [1] on 
> the lucene-4.0 branch.
> I am providing the source code of its implementation. Some work needs to be 
> done, as this implementation is working on the old lucene-1458 branch. 
> I will attach a tarball containing a running version (with tests) of the AFOR 
> implementation, as well as the implementations of PFOR and of Simple64 
> (simple family codec working on 64bits word) that has been used in the 
> experiments in [1].
> [1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to