[jira] [Comment Edited] (NIFI-14162) QueryRecord not following column order

Daniel Stieglitz (Jira) Mon, 28 Apr 2025 13:02:04 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947923#comment-17947923
 ]


Daniel Stieglitz edited comment on NIFI-14162 at 4/28/25 8:01 PM:
------------------------------------------------------------------

[~exceptionfactory] When taking a closer look at this issue I noticed this is 
not a Calcite issue but I believe it is our issue. When using the "Infer 
Schema" schema access strategy I noticed there was a difference between where 
there was one record and when there was two

For the single record the schema (when calling toString on the RecordSchema 
object) is
{code:java}
["ArticleCode" : "STRING", "ProductCode" : "STRING", "ArticleName" : "STRING", 
"ProductName" : "STRING", "Country" : "STRING"]{code}
For the Two records the schema is
{code:java}
["ArticleCode" : "STRING", "ArticleName" : "STRING", "ProductCode" : "STRING", 
"ProductName" : "STRING", "Country" : "STRING"]{code}
Note how the ordering between the two schemes are different i.e. the second 
column whether its ProductCode or ArticleName. The difference in the ordering 
is the difference in what values are being retrieved. In the class 
RecordDataSource on lines 79 and 89 the row returned from the query is the 
record retrieved from the JsonTreeReader in the exact order the record is 
defined in the schema. These column values are not mapped to the query columns. 
Hence with the first record the incorrect value is retrieved as the query 
columns are not aligned to the schema columns while in the second case all the 
values are correct as the query columns are aligned to the schema 

Another way of highlighting the issue being seen which I find more troublesome 
is if the query for example has columns in reverse order of the schema
{code:java}
SELECT Country, ProductName, ProductCode, ArticleName, ArticleCode FROM 
FLOWFILE {code}
so the query results from the same two ingested records above are
{code:java}
[ {
  "Country" : "12345",
  "ProductName" : "Credit Card",
  "ProductCode" : "10101",
  "ArticleName" : "Porduct Credit",
  "ArticleCode" : "RO"
}, {
  "Country" : "12346",
  "ProductName" : "Business Card",
  "ProductCode" : "10102",
  "ArticleName" : "Society Credit",
  "ArticleCode" : "RO"
} ]{code}
Hence I see from here, any attempt to obtain all the columns in a query if its 
not exactly the same order as the defined schema will get incorrect values.

Also please note this issue is even when a schema is defined for the reader.


was (Author: JIRAUSER294662):
[~exceptionfactory] When taking a closer look at this issue I noticed this is 
not a Calcite issue but I believe it is our issue. When using the "Infer 
Schema" schema access strategy I noticed there was a difference between where 
there was one record and when there was two

For the single record the schema (when calling toString on the RecordSchema 
object) is
{code:java}
["ArticleCode" : "STRING", "ProductCode" : "STRING", "ArticleName" : "STRING", 
"ProductName" : "STRING", "Country" : "STRING"]{code}
For the Two records the schema is
{code:java}
["ArticleCode" : "STRING", "ArticleName" : "STRING", "ProductCode" : "STRING", 
"ProductName" : "STRING", "Country" : "STRING"]{code}
Note how the ordering between the two schemes are different i.e. the second 
column whether its ProductCode or ArticleName. The difference in the ordering 
is the difference in what values are being retrieved. In the class 
RecordDataSource on lines 79 and 89 the row returned from the query is the 
record retrieved from the JsonTreeReader in the exact order the record is 
defined in the schema. These column values are not mapped to the query columns. 
Hence with the first record the incorrect value is retrieved as the query 
columns are not aligned to the schema columns while in the second case all the 
values are correct as the query columns are aligned to the schema 

Another way of highlighting the issue being seen which I find more troublesome 
is if the query for example has columns in reverse order of the schema
{code:java}
SELECT Country, ProductName, ProductCode, ArticleName, ArticleCode FROM 
FLOWFILE {code}
so the query results from the same two ingested records above are
{code:java}
[ {
  "Country" : "12345",
  "ProductName" : "Credit Card",
  "ProductCode" : "10101",
  "ArticleName" : "Porduct Credit",
  "ArticleCode" : "RO"
}, {
  "Country" : "12346",
  "ProductName" : "Business Card",
  "ProductCode" : "10102",
  "ArticleName" : "Society Credit",
  "ArticleCode" : "RO"
} ]{code}
Hence I see from here, any attempt to obtain all the columns in a query if its 
not exactly the same order as the defined schema will get incorrect values.

> QueryRecord not following column order
> --------------------------------------
>
>                 Key: NIFI-14162
>                 URL: https://issues.apache.org/jira/browse/NIFI-14162
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: alexduta
>            Assignee: Daniel Stieglitz
>            Priority: Minor
>
> It seems that, in Nifi, in the QueryRecord processor, if you define some 
> columns you select from flowfile, it is following the order, but is taking 
> the values in their order from the original flowfile.
> As an example:
> Original Json flowfile:
> {code:java}
> // [{
>   "ArticleCode" : "12345",
>   "ProductCode" : "10101",
>   "ArticleName" : "Credit Card",
>   "ProductName" : "Porduct Credit",
>   "Country" : "RO"
> }] {code}
>  
>  
> Query:
>  
> {code:java}
> // select ArticleCode,ArticleName,ProductCode,ProductName,Country from 
> FLOWFILE
> {code}
>  
>  
> Returning file:
>  
> {code:java}
> //  [{
>   "ArticleCode" : "12345",
>   "ArticleName" : "10101",
>   "ProductCode" : "Credit Card",
>   "ProductName" : "Porduct Credit",
>   "Country" : "RO"
> }]  {code}
>  
> So, it is somehow just adding the name of the columns, but is not changing 
> the values.
>  
> Otherwise, if one of the records in the flowfile has the correct order in it, 
> the others will follow the correct rule:
> Original file:
> {code:java}
> //  [{
>   "ArticleCode" : "12345",
>   "ArticleName" : "Credit Card",
>   "ProductCode" : "10101",
>   "ProductName" : "Porduct Credit",
>   "Country" : "RO"
> },
> {
>   "ArticleCode" : "12346",
>   "ProductCode" : "10102",
>   "ArticleName" : "Business Card",
>   "ProductName" : "Society Credit",
>   "Country" : "RO"
> }]  {code}
> or
> {code:java}
> // [
>     {
>   "ArticleCode" : "12345",
>   "ProductCode" : "10101",
>   "ArticleName" : "Credit Card",
>   "ProductName" : "Porduct Credit",
>   "Country" : "RO"
> },{
>   "ArticleCode" : "12346",
>   "ArticleName" : "Business Card",
>   "ProductCode" : "10102",
>   "ProductName" : "Society Credit",
>   "Country" : "RO"
> }]    {code}
> Returning file:
> {code:java}
> //  [{
>   "ArticleCode" : "12345",
>   "ArticleName" : "Credit Card",
>   "ProductCode" : "10101",
>   "ProductName" : "Porduct Credit",
>   "Country" : "RO"
> },
> {
>   "ArticleCode" : "12346",
>   "ArticleName" : "Business Card",
>   "ProductCode" : "10102",
>   "ProductName" : "Society Credit",
>   "Country" : "RO"
> }]  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (NIFI-14162) QueryRecord not following column order

Reply via email to