[ 
https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Kawamura updated NIFI-3248:
--------------------------------
    Description: 
GetSolr holds the last query timestamp so that it only fetches documents those 
have been added or updated since the last query.
However, GetSolr misses some of those updated documents, and once the documents 
date field value becomes older than last query timestamp, the document won't be 
able to be queried by GetSolr any more.

This JIRA is for tracking the process of investigating this behavior, and 
discussion on them.
Here are things that can be a cause of this behavior:

|#|Short description|Should we address it?|
|1|Timestamp range filter, curly or square bracket?|No|


- However, since the timestamp filter is not properly formatted as a valid time 
range filter, GetSolr never fetches newly added documents.-
Although the timestamp rage filter is not properly formatted with square 
brackets, GetSolr seems to manage fetching newly added documents. So, I lowered 
the priority.

The code has been the same in the [0.5 
branch|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202],
 so it seems it hasn't been working as expected.

{code}
        // if initialized then apply a filter to restrict results from the last 
end time til now
        if (initialized) {
            StringBuilder filterQuery = new StringBuilder();
            filterQuery.append(context.getProperty(DATE_FIELD).getValue())
                    // This should be a square bracket :[
                    .append(":{").append(lastEndDatedRef.get()).append(" TO ")
                    .append(currDate).append("]");
            solrQuery.addFilterQuery(filterQuery.toString());
            logger.info("Applying filter query {}", new 
Object[]{filterQuery.toString()});
        }
{code}

  was:
GetSolr holds the last query timestamp so that it only fetches documents those 
have been added or updated since the last query.

- However, since the timestamp filter is not properly formatted as a valid time 
range filter, GetSolr never fetches newly added documents.-
Although the timestamp rage filter is not properly formatted with square 
brackets, GetSolr seems to manage fetching newly added documents. So, I lowered 
the priority.

The code has been the same in the [0.5 
branch|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202],
 so it seems it hasn't been working as expected.

{code}
        // if initialized then apply a filter to restrict results from the last 
end time til now
        if (initialized) {
            StringBuilder filterQuery = new StringBuilder();
            filterQuery.append(context.getProperty(DATE_FIELD).getValue())
                    // This should be a square bracket :[
                    .append(":{").append(lastEndDatedRef.get()).append(" TO ")
                    .append(currDate).append("]");
            solrQuery.addFilterQuery(filterQuery.toString());
            logger.info("Applying filter query {}", new 
Object[]{filterQuery.toString()});
        }
{code}


> GetSolr cannot query newly added documents
> ------------------------------------------
>
>                 Key: NIFI-3248
>                 URL: https://issues.apache.org/jira/browse/NIFI-3248
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, 
> 1.0.1
>            Reporter: Koji Kawamura
>            Priority: Minor
>         Attachments: nifi-flow.png, query-result-with-curly-bracket.png, 
> query-result-with-square-bracket.png
>
>
> GetSolr holds the last query timestamp so that it only fetches documents 
> those have been added or updated since the last query.
> However, GetSolr misses some of those updated documents, and once the 
> documents date field value becomes older than last query timestamp, the 
> document won't be able to be queried by GetSolr any more.
> This JIRA is for tracking the process of investigating this behavior, and 
> discussion on them.
> Here are things that can be a cause of this behavior:
> |#|Short description|Should we address it?|
> |1|Timestamp range filter, curly or square bracket?|No|
> - However, since the timestamp filter is not properly formatted as a valid 
> time range filter, GetSolr never fetches newly added documents.-
> Although the timestamp rage filter is not properly formatted with square 
> brackets, GetSolr seems to manage fetching newly added documents. So, I 
> lowered the priority.
> The code has been the same in the [0.5 
> branch|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202],
>  so it seems it hasn't been working as expected.
> {code}
>         // if initialized then apply a filter to restrict results from the 
> last end time til now
>         if (initialized) {
>             StringBuilder filterQuery = new StringBuilder();
>             filterQuery.append(context.getProperty(DATE_FIELD).getValue())
>                     // This should be a square bracket :[
>                     .append(":{").append(lastEndDatedRef.get()).append(" TO ")
>                     .append(currDate).append("]");
>             solrQuery.addFilterQuery(filterQuery.toString());
>             logger.info("Applying filter query {}", new 
> Object[]{filterQuery.toString()});
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to