[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Christian Moen (Issue Comment Edited) (JIRA) Tue, 27 Mar 2012 09:00:52 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239579#comment-13239579
 ]


Christian Moen edited comment on SOLR-3282 at 3/27/12 3:59 PM:
---------------------------------------------------------------

h5. Test 1: Indexing Japanese Wikipedia

I've extracted text pretty accurately from Japanese Wikipedia and removed all 
the gory markup so the content is clean.  There are 1,443,764 documents in 
total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 
documents per file.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; 
charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real    18m39.206s
user    0m12.682s
sys     0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large 
heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                
      was (Author: cm):
    h3. Test 1: Indexing Japanese Wikipedia

I've extracted text pretty accurately from Japanese Wikipedia and removed all 
the gory markup so the content is clean.  There are 1,443,764 documents in 
total and this is mix of short and very long documents.

These have been converted this to files in Solr XML format and there is 1,000 
documents per file.

I'm posting using 

{noformat}
curl -s http://localhost:8983/solr/update -H 'Content-type:text/xml; 
charset=UTF-8' --data-binary @solrxml/SolrXml-171.xml
{noformat}

and committing after all the files have been posted with

{noformat}
curl -s http://localhost:8983/solr/update -F 'stream.body= <commit />'
{noformat}

Posting the entire Wikipedia in one file is perhaps a lot faster.

Posting took

{noformat}
real    18m39.206s
user    0m12.682s
sys     0m11.065s
{noformat}

The GC log looks fine.  There wasn't even a full GC probably like to the large 
heap size.

I'm attaching these files

|| Filename || Description ||
|jawiki-index-gc.log| GC log |
|jawiki-index-gcviewer.png| Screenshot from GCViewer |
|jawiki-index-visualvm.png| Screenshot from VisualVM | 

                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>
> Kuromoji might be used by many and also in mission critical systems.  I'd 
> like to run a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype 
> {{text_ja}} as follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a 
> never ending loop
> # Simultaneously run many tens of thousands typical Japanese queries against 
> the index at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze

Reply via email to