[
https://issues.apache.org/jira/browse/NIFI-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285035#comment-15285035
]
Josh Elser commented on NIFI-1280:
----------------------------------
bq. I don't know Calcite very well, but “read the data multiple times in order
to perform the JOIN”
doesn't sound good.
I'm not 100% what all is implemented, but I was generally speaking about
[Sort-merge joins|https://en.wikipedia.org/wiki/Sort-merge_join]. This is
fairly common and not subject to memory constraints. I'm not sure how else you
could do this for arbitrarily large datasets that doesn't blow out memory.
> Create FilterCSVColumns Processor
> ---------------------------------
>
> Key: NIFI-1280
> URL: https://issues.apache.org/jira/browse/NIFI-1280
> Project: Apache NiFi
> Issue Type: Task
> Components: Extensions
> Reporter: Mark Payne
> Assignee: Toivo Adams
>
> We should have a Processor that allows users to easily filter out specific
> columns from CSV data. For instance, a user would configure two different
> properties: "Columns of Interest" (a comma-separated list of column indexes)
> and "Filtering Strategy" (Keep Only These Columns, Remove Only These Columns).
> We can do this today with ReplaceText, but it is far more difficult than it
> would be with this Processor, as the user has to use Regular Expressions,
> etc. with ReplaceText.
> Eventually a Custom UI could even be built that allows a user to upload a
> Sample CSV and choose which columns from there, similar to the way that Excel
> works when importing CSV by dragging and selecting the desired columns? That
> would certainly be a larger undertaking and would not need to be done for an
> initial implementation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)