[ https://issues.apache.org/jira/browse/SOLR-12854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amrit Sarkar updated SOLR-12854: -------------------------------- Issue Type: Improvement (was: Bug) > Document steps to improve delta import via DataImportHandler > ------------------------------------------------------------- > > Key: SOLR-12854 > URL: https://issues.apache.org/jira/browse/SOLR-12854 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler > Affects Versions: 7.5 > Reporter: Amrit Sarkar > Priority: Major > > Delta imports in DataImportHandler is sometimes slower than full imports > where the delta import makes multiple queries compare to full import and > hence making it time complex. Listed in: > https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport > In the mailing list; > http://lucene.472066.n3.nabble.com/Number-of-requests-spike-up-when-i-do-the-delta-Import-td4338162.html > one of the Solr users have noted a workaround which works perfectly and > improves delta import performance, where we need to specify > ${dataimporter.last_index_time} in the delta_import_query, and not > delta_sql_query. > {code} > I found a hacky way to limit the number of > times deltaImportQuery was executed. > As designed, solr executes deltaQuery to get a list of ids that need to be > indexed. For each of those, it executes deltaImportQuery, which is typically > very similar to the full query. > I constructed a deltaQuery to purposely only return 1 row. E.g. > deltaQuery = "SELECT id FROM table WHERE rownum=1" // written for > oracle, likely requires a different syntax for other dbs. Also, it occurred > to you could probably include the date>= '${dataimporter.last_index_time}' > filter here so this returns 0 rows if no data has changed > Since deltaImportQuery now *only gets called once I needed to add the filter > logic to *deltaImportQuery *to only select the changed rows (that logic is > normally in *deltaQuery). E.g. > deltaImportQuery = [normal import query] WHERE date >= > '${dataimporter.last_index_time}' > {code} > A number of other users have adopted the strategy and DIH delta import > performance has improved, and henceforth documenting this strategy as TIP > will help other users too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org