I’m seeing multiple red flags for performance here. The top ones are “DIH”,
“MongoDB”, and “SQL on MongoDB”. MongoDB is not a relational database.

Our multi-threaded extractor using the Mongo API was still three times slower
than the same approach on MySQL.

Check the CPU usage on the Solr hosts while you are indexing. If it is under 
50%, the bottleneck is MongoDB and single-threaded indexing.

For another check, run that same query in a regular database client and time it.
The Solr indexing will never be faster than that.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 17, 2020, at 11:58 AM, Abhijit Pawar <aapawar.s...@gmail.com> wrote:
> 
> Sure Divye,
> 
> *Here's the config.....*
> 
> *conf/solr-config.xml:*
> 
> <lib dir="../../../../dist/" regex="solr-dataimporthandler-.*\.jar" />
> <!-- DIH Starts -->
> <requestHandler name="/dataimport"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
> <lst name="defaults">
> <str
> name="config">/home/ec2-user/solr/solr-5.4.1/server/solr/test_core/conf/dataimport/data-source-config.xml</str>
> 
> </lst>
> </requestHandler>
> <!-- DIH ends -->
> 
> *schema.xml:*
> has of all the field definitions
> 
> *conf/dataimport/data-source-config.xml*
> 
> <dataConfig>
> <dataSource name="mongod" type="JdbcDataSource"
> driver="com.mongodb.jdbc.MongoDriver" url="mongodb://<<IP
> ADDRESS>>:27017/<<DB>>"/>
> <document name="products">
> <entity name="products"
> dataSource="mongod"
> transformer="<<Custom Transformer>>,TemplateTransformer"
> onError="continue"
> pk="uuid"
> query="SELECT field1,field2,field3,...... FROM products"
> deltaImportQuery="SELECT field1,field2,field3,...... FROM products WHERE
> orgidStr = '${dataimporter.request.orgid}' AND idStr =
> '${dataimporter.delta.idStr}'"
> deltaQuery="SELECT idStr FROM products WHERE orgidStr =
> '${dataimporter.request.orgid}' AND updatedAt >
> '${dataimporter.last_index_time}'"
>> 
> <field column="field1" name="fieldName1"/>
> <field column="field2" name="fieldName2"/>
> <field column="field3" name="fieldName3"/>
> <entity name=categories">
> .
> .
> . 4-5 more nested entities.......
> 
> On Mon, Aug 17, 2020 at 1:32 PM Divye Handa <divye.handaa...@gmail.com>
> wrote:
> 
>> Can you share the dih configuration you are using for same?
>> 
>> On Mon, 17 Aug, 2020, 23:52 Abhijit Pawar, <aapawar.s...@gmail.com> wrote:
>> 
>>> Hello,
>>> 
>>> We are indexing some 200K plus documents in SOLR 5.4.1 with no shards /
>>> replicas and just single core.
>>> It takes almost 3.5 hours to index that data.
>>> I am using a data import handler to import data from the mongo database.
>>> 
>>> Is there something we can do to reduce the time taken to index?
>>> Will upgrade to newer version help?
>>> 
>>> Appreciate your help!
>>> 
>>> Regards,
>>> Abhijit
>>> 
>> 

Reply via email to