[
https://issues.apache.org/jira/browse/SOLR-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112530#comment-15112530
]
Shalin Shekhar Mangar commented on SOLR-8582:
---------------------------------------------
Thanks Noble. Your patch fixes the slowdown:
{code}
./bin/solr start -e schemaless -m 2g
time curl 'http://localhost:8983/solr/gettingstarted/update' --data-binary
@/solr-data/imdb.json
{"responseHeader":{"status":0,"QTime":195917}}
real 3m16.231s
user 0m0.274s
sys 0m0.681s
./bin/solr stop
rm -r example/schemaless
./bin/solr start -e schemaless -m 2g
time curl 'http://localhost:8983/solr/gettingstarted/update/json/docs'
--data-binary @/solr-data/imdb.json
{"responseHeader":{"status":0,"QTime":192269}}
real 3m12.596s
user 0m0.268s
sys 0m0.721s
{code}
Memory consumption has also reduced. I can now index the same document with
512m of heap. I think there's still some memory pressure but it is not that bad
e.g. the following is with 512m of heap:
{code}
./bin/solr start -e schemaless
time curl 'http://localhost:8983/solr/gettingstarted/update/json/docs'
--data-binary @/solr-data/imdb.json
{"responseHeader":{"status":0,"QTime":244608}}
real 4m4.924s
user 0m0.294s
sys 0m0.780s
./bin/solr stop
rm -r example/schemaless
./bin/solr start -e schemaless
time curl 'http://localhost:8983/solr/gettingstarted/update' --data-binary
@/solr-data/imdb.json
{"responseHeader":{"status":0,"QTime":231332}}
real 3m51.638s
user 0m0.291s
sys 0m0.745s
{code}
Minor nit - JsonRecordReader#handleObjectStart has an unused argument
childrenFound
> /update/json/docs is 4x slower than /update for indexing a list of json docs
> ----------------------------------------------------------------------------
>
> Key: SOLR-8582
> URL: https://issues.apache.org/jira/browse/SOLR-8582
> Project: Solr
> Issue Type: Bug
> Components: update
> Reporter: Shalin Shekhar Mangar
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8582.patch, SOLR-8582.patch
>
>
> Indexing a ~650 MB json file containing a list of 2.2 million json documents,
> I found that bin/post had become 4x slower after SOLR-7042. Memory
> consumption has also gone up and I can no longer index this file with a 512mb
> heap.
> The difference is because we now default to /update/json/docs instead of
> /update. This can be verified on trunk:
> {code}
> time curl 'http://localhost:8983/solr/gettingstarted/update' --data-binary
> @/hdd/solr-data/imdb.json
> {"responseHeader":{"status":0,"QTime":161869}}
>
> real 2m42.044s
> user 0m0.292s
> sys 0m0.493s
>
> time curl 'http://localhost:8983/solr/gettingstarted/update/json/docs'
> --data-binary @/hdd/solr-data/imdb.json
> {"responseHeader":{"status":0,"QTime":686264}}
>
> real 11m26.478s
> user 0m0.324s
> sys 0m0.552s
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]