[
https://issues.apache.org/jira/browse/NIFI-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271027#comment-15271027
]
Toivo Adams commented on NIFI-1280:
-----------------------------------
Hi,
I propose we use SQL instead of separate properties: "Columns of Interest" (a
comma-separated list of column indexes) and "Filtering Strategy" (Keep Only
These Columns, Remove Only These Columns).
For example let's assume data is:
FIRST_NAME:string,LAST_NAME,COMPANY_NAME,ADDRESS,CITY,COUNTY,STATE,zip,phone1,phone2,email,web
"James","Butt","Benton, John B Jr","6649 N Blue Gum St","New
Orleans","Orleans","LA",70116,"504-621-8927","504-845-1427","[email protected]","http://www.bentonjohnbjr.com"
"Josephine","Darakjy","Chanay, Jeffrey A Esq","4 B Blue Ridge
Blvd","Brighton","Livingston","MI",48116,"810-292-9388","810-374-9840","[email protected]","http://www.chanayjeffreyaesq.com"
"Art","Venere","Chemel, James L Cpa","8 W Cerritos Ave
#54","Bridgeport","Gloucester","NJ","08014","856-636-8749","856-264-4130","[email protected]","http://www.chemeljameslcpa.com"
"Lenna","Paprocki","Feltz Printing Service","639 Main
St","Anchorage","Anchorage","AK",99501,"907-385-4412","907-921-2010","[email protected]","http://www.feltzprintingservice.com"
"Donette","Foller","Printing Dimensions","34 Center
St","Hamilton","Butler","OH",45011,"513-570-1893","513-549-4561","[email protected]","http://www.printingdimensions.com"
"Simona","Morasca","Chapman, Ross E Esq","3 Mcauley
Dr","Ashland","Ashland","OH",44805,"419-503-2484","419-800-6759","[email protected]","http://www.chapmanrosseesq.com"
"Mitsue","Tollner","Morlong Associates","7 Eads
St","Chicago","Cook","IL",60632,"773-573-6914","773-924-8565","[email protected]","http://www.morlongassociates.com"
. . .
SQL select parameter is:
select first_name, last_name, company_name, address, city from SALES.US500
where city='New York'"
and result is:
Willow, Kusko, U Pull It, 90991 Thorburn Ave, New York
Alishia, Sergi, Milford Enterprises Inc, 2742 Distribution Way, New York
Jose, Stockham, Tri State Refueler Co, 128 Bransten Rd, New York
Brock, Bolognia, Orinda News, 4486 W O St #1, New York
Tawna, Buvens, H H H Enterprises Inc, 3305 Nabell Ave #679, New York
Ozell, Shealy, Silver Bros Inc, 8 Industry Ln, New York
Layla, Springe, Chadds Ford Winery, 229 N Forty Driv, New York
Fausto, Agramonte, Marriott Hotels Resorts Suites, 5 Harrison Rd, New York
SQL is well understood and many different combinations can be used.
You can easily rename columns, perform simple calculations, aggregations, etc.
Even joins are possible - but this is out of scope here.
Thanks
toivo
> Create FilterCSVColumns Processor
> ---------------------------------
>
> Key: NIFI-1280
> URL: https://issues.apache.org/jira/browse/NIFI-1280
> Project: Apache NiFi
> Issue Type: Task
> Components: Extensions
> Reporter: Mark Payne
> Assignee: Toivo Adams
>
> We should have a Processor that allows users to easily filter out specific
> columns from CSV data. For instance, a user would configure two different
> properties: "Columns of Interest" (a comma-separated list of column indexes)
> and "Filtering Strategy" (Keep Only These Columns, Remove Only These Columns).
> We can do this today with ReplaceText, but it is far more difficult than it
> would be with this Processor, as the user has to use Regular Expressions,
> etc. with ReplaceText.
> Eventually a Custom UI could even be built that allows a user to upload a
> Sample CSV and choose which columns from there, similar to the way that Excel
> works when importing CSV by dragging and selecting the desired columns? That
> would certainly be a larger undertaking and would not need to be done for an
> initial implementation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)