[jira] [Commented] (FALCON-36) Ability to ingest data from databases

Ajay Yadava (JIRA) Tue, 04 Aug 2015 02:05:45 -0700

    [ 
https://issues.apache.org/jira/browse/FALCON-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653314#comment-14653314
 ]


Ajay Yadava commented on FALCON-36:
-----------------------------------

[~me.venkatr] I am suggesting to have a type attribute instead of having them 
as top level entities. It is completely in line with consumer requirements. All 
needs and use cases including being able to list all databases can be trivially 
achieved by filtering on the type attribute.

I disagree with current thinking that top level entities of type database 
instead of datasource is better from usabality stand point.  It's worse. I have 
already given a case of confusion between streaming feeds vs. kafka entities. 

It's far easier to understand and use if we say that datasources are the 
sources for importing and exporting data and database, kafka etc. are various 
types of datasources supported.  On the contrary it's confusing to say we have 
one entity of type database which has x,y,z and then we have Kafka. What is the 
purpose of each of them? Can we reuse and treat kafka entities as feeds? Are 
kafka entities schedulable? All these questions will need to be answered for 
each new type of entity. Another point is that the users need to remember all 
the types as they need to specify it in various commands and it's easier to 
remember just one type "datasource" rather than "database" and "kafka". There 
are several examples like that.

>From a maintainability of code stand point of view also it's lot helpful to 
>classify them as a single entity e.g. what is the order of load of entities? 
>What about validity? It's lot easier to classify it by just saying that 
>entities of types data source load at this order than specifying them for each 
>type of datasource. 

Please reconsider.

> Ability to ingest data from databases
> -------------------------------------
>
>                 Key: FALCON-36
>                 URL: https://issues.apache.org/jira/browse/FALCON-36
>             Project: Falcon
>          Issue Type: Improvement
>          Components: acquisition
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkat Ramachandran
>         Attachments: FALCON-36.patch, FALCON-36.patch.2, 
> FALCON-36.rebase.patch, FALCON-36.review.patch, Falcon Data Ingestion - 
> Proposal.docx, falcon-36.xsd.patch.1
>
>
> Attempt to address data import from RDBMS into hadoop and export of data from 
> Hadoop into RDBMS. The plan is to use sqoop 1.x to materialize data motion 
> from/to RDBMS to/from HDFS. Hive will not be integrated in the first pass 
> until Falcon has a first class integration with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FALCON-36) Ability to ingest data from databases

Reply via email to