[
https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Koji Kawamura updated NIFI-3248:
--------------------------------
Description:
GetSolr holds the last query timestamp so that it only fetches documents those
have been added or updated since the last query.
However, GetSolr misses some of those updated documents, and once the documents
date field value becomes older than last query timestamp, the document won't be
able to be queried by GetSolr any more.
This JIRA is for tracking the process of investigating this behavior, and
discussion on them.
Here are things that can be a cause of this behavior:
|#|Short description|Should we address it?|
|1|Timestamp range filter, curly or square bracket?|No|
1. Timestamp range filter, curly or square bracket?
At the first glance, using curly and square bracket in mix looked strange
([source
code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]).
But these difference has a meaning.
The square bracket on the range query is inclusive and the curly bracket is
exclusive. If we use inclusive on both sides and a document has a time stamp
exactly on the boundary then it could be returned in two consecutive
executions, and we only want it in one.
This is intentional, and it should be as it is.
- However, since the timestamp filter is not properly formatted as a valid time
range filter, GetSolr never fetches newly added documents.-
Although the timestamp rage filter is not properly formatted with square
brackets, GetSolr seems to manage fetching newly added documents. So, I lowered
the priority.
The code has been the same in the , so it seems it hasn't been working as
expected.
{code}
// if initialized then apply a filter to restrict results from the last
end time til now
if (initialized) {
StringBuilder filterQuery = new StringBuilder();
filterQuery.append(context.getProperty(DATE_FIELD).getValue())
// This should be a square bracket :[
.append(":{").append(lastEndDatedRef.get()).append(" TO ")
.append(currDate).append("]");
solrQuery.addFilterQuery(filterQuery.toString());
logger.info("Applying filter query {}", new
Object[]{filterQuery.toString()});
}
{code}
was:
GetSolr holds the last query timestamp so that it only fetches documents those
have been added or updated since the last query.
However, GetSolr misses some of those updated documents, and once the documents
date field value becomes older than last query timestamp, the document won't be
able to be queried by GetSolr any more.
This JIRA is for tracking the process of investigating this behavior, and
discussion on them.
Here are things that can be a cause of this behavior:
|#|Short description|Should we address it?|
|1|Timestamp range filter, curly or square bracket?|No|
- However, since the timestamp filter is not properly formatted as a valid time
range filter, GetSolr never fetches newly added documents.-
Although the timestamp rage filter is not properly formatted with square
brackets, GetSolr seems to manage fetching newly added documents. So, I lowered
the priority.
The code has been the same in the [0.5
branch|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202],
so it seems it hasn't been working as expected.
{code}
// if initialized then apply a filter to restrict results from the last
end time til now
if (initialized) {
StringBuilder filterQuery = new StringBuilder();
filterQuery.append(context.getProperty(DATE_FIELD).getValue())
// This should be a square bracket :[
.append(":{").append(lastEndDatedRef.get()).append(" TO ")
.append(currDate).append("]");
solrQuery.addFilterQuery(filterQuery.toString());
logger.info("Applying filter query {}", new
Object[]{filterQuery.toString()});
}
{code}
> GetSolr cannot query newly added documents
> ------------------------------------------
>
> Key: NIFI-3248
> URL: https://issues.apache.org/jira/browse/NIFI-3248
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1,
> 1.0.1
> Reporter: Koji Kawamura
> Priority: Minor
> Attachments: nifi-flow.png, query-result-with-curly-bracket.png,
> query-result-with-square-bracket.png
>
>
> GetSolr holds the last query timestamp so that it only fetches documents
> those have been added or updated since the last query.
> However, GetSolr misses some of those updated documents, and once the
> documents date field value becomes older than last query timestamp, the
> document won't be able to be queried by GetSolr any more.
> This JIRA is for tracking the process of investigating this behavior, and
> discussion on them.
> Here are things that can be a cause of this behavior:
> |#|Short description|Should we address it?|
> |1|Timestamp range filter, curly or square bracket?|No|
> 1. Timestamp range filter, curly or square bracket?
> At the first glance, using curly and square bracket in mix looked strange
> ([source
> code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]).
> But these difference has a meaning.
> The square bracket on the range query is inclusive and the curly bracket is
> exclusive. If we use inclusive on both sides and a document has a time stamp
> exactly on the boundary then it could be returned in two consecutive
> executions, and we only want it in one.
> This is intentional, and it should be as it is.
> - However, since the timestamp filter is not properly formatted as a valid
> time range filter, GetSolr never fetches newly added documents.-
> Although the timestamp rage filter is not properly formatted with square
> brackets, GetSolr seems to manage fetching newly added documents. So, I
> lowered the priority.
> The code has been the same in the , so it seems it hasn't been working as
> expected.
> {code}
> // if initialized then apply a filter to restrict results from the
> last end time til now
> if (initialized) {
> StringBuilder filterQuery = new StringBuilder();
> filterQuery.append(context.getProperty(DATE_FIELD).getValue())
> // This should be a square bracket :[
> .append(":{").append(lastEndDatedRef.get()).append(" TO ")
> .append(currDate).append("]");
> solrQuery.addFilterQuery(filterQuery.toString());
> logger.info("Applying filter query {}", new
> Object[]{filterQuery.toString()});
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)