[
https://issues.apache.org/jira/browse/SOLR-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303038#comment-14303038
]
Noble Paul commented on SOLR-7061:
----------------------------------
bq.We want to cache as minimum data as possible for each function. ContextImpl
uses a HashMap<String, Object> to store document-level data in DocWrapper for
all entities, no matter if they will be used by any function or not, which may
cache unnecessary values in memory.
I fail to understand this fully. Th context only stores the values you
explicitly set. So where are the unnecessary values coming from?
> Cross-Entity Variable Resolving and Arguments for ScriptTransformer Functions
> -----------------------------------------------------------------------------
>
> Key: SOLR-7061
> URL: https://issues.apache.org/jira/browse/SOLR-7061
> Project: Solr
> Issue Type: Improvement
> Components: contrib - DataImportHandler
> Affects Versions: 4.10.3
> Reporter: Mark Peng
> Priority: Minor
> Labels: dataimport, transformers
> Attachments: SOLR-7061.patch
>
>
> Script Transformer has been widely used to modify the value of columns of
> selected rows from targeting data source (such as SQL Database) based on
> specific logics, before writing to Solr as documents. However, current
> implementation has the following limitations:
> *1. It is not possible to pass constant values or resolved variables (e.g.,
> $\{TABLE.COLUMN\} ) as arguments to a script function.*
> *2. Cross-entity row data exchange is not possible as well.*
> In our use case, we have complex nested entities and rely heavily on the
> script functions to transform table rows while doing data import. Sometimes
> for each single document, we need to get the selected column values from a
> parent entity into current entity for doing value transformation and applying
> if-else logics. To achieve this, we need to join with others tables in the
> SQL of current entity, which is quite resource-consuming, especially for
> large tables.
> Therefore, we have done some improvements to allow us to pass selected column
> values from entity A to another entity B as its function arguments by
> utilizing variable resolver.
> Here is an example about how it works. Suppose we have the following
> configuration:
> {code}
> <dataConfig>
> <dataSource name="ProductDB"
> driver="oracle.jdbc.driver.OracleDriver"
> url="jdbc:oracle:thin:@${dataimporter.request.host}:
>
> ${dataimporter.request.port}/${dataimporter.request.name}"
> user="${dataimporter.request.user}"
> password="${dataimporter.request.password}"
> autoCommit="true"/>
> <!-- ScriptTransformer functions -->
> <script><![CDATA[
> function processItemRow(row, resolvedVars) {
> var isOnSale = resolvedVars.get("${PRODUCT.IS_ONSALE}");
> var discount = resolvedVars.get("${PRODUCT.DISCOUNT_RATE}");
> var price = row.get("PRICE");
>
> if(isOnSale) {
> row.put("PRICE", price * discount);
> }
> else
> row.put("PRICE", price);
>
> return row;
> }
> ]]>
> </script>
> <document name="EC_SHOP">
> <entity dataSource="ProductDB" name="PRODUCT"
> query="SELECT PRODUCT_ID, TITLE, IS_ONSALE, DISCOUNT_RATE
> FROM PRODUCT">
> <field column="PRODUCT_ID" name="PRODUCT_ID"/>
> <field column="TITLE" name="TITLE"/>
> <field column="IS_ONSALE" name="IS_ONSALE"/>
> <field column="DISCOUNT_RATE" name="DISCOUNT_RATE"/>
>
>
> <entity dataSource="ProductDB" name="ITEM"
>
> transformer="script:processItemRow(${PRODUCT.IS_ONSALE},${PRODUCT.DISCOUNT_RATE})"
> query="SELECT PRICE FROM ITEM WHERE PRODUCT_ID =
> '${PRODUCT.PRODUCT_ID}'">
> <field column="PRICE" name="PRICE"/>
> </entity>
> </entity>
> </document>
> </dataConfig>
> {code}
> As demonstrated above, now we can get access to the value of column
> *IS_ONSALE* and *DISCOUNT_RATE* of table *PRODUCT* from the entity of table
> *ITEM* by passing *$\{PRODUCT.IS_ONSALE\}* and *$\{PRODUCT.DISCOUNT_RATE\}*
> as arguments of the function *processItemRow* to determine if we should give
> some discounts for the production price. The signature of function has a
> secondary argument (named *resolvedVars* here) for passing the map of column
> values resolved from other previous entities.
> This improvement gives more flexibility for script functions to exchange row
> data cross entities (even cross datasource) and do more complex processing
> for entity rows.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]