Though it would be great to avoid creating meta table, we cannot avoid it if we want to achieve complete EXACTLY_ONCE here. Even Kafka documentation suggests the exactly once implementation is not perfect.
*"This is not "perfect exact once" in 2 cases: 1 Multiple producers produce messages to same kafka partition 2 You have same message sent out and before kafka synchronized this message among all the brokers, the operator is started again."* Even other distributed protocols like 3 phase commit which are generally used in distributed systems make use of write ahead logging (WAL) for each system participating in transaction. Considering that a DB is involved, we cannot have a WAL without something like dt_meta table. So, even such protocols wont be useful here. The following article too suggests that exactly once in a distributed system is not possible without committing offset to the same system. Also, unlike Kafka, we don't have offset with DB http://ben.kirw.in/2014/11/28/kafka-patterns/ Though one suggestion could be : If user is reluctant to create table in database, we can use HDFS to give an "Almost EXACTLY ONCE" like Kafka Ajay On Mon, Jan 16, 2017 at 12:04 PM, Chinmay Kolhatkar <chin...@apache.org> wrote: -1 for automatic schema creation... Moreover, I am wondering whether asking user to create a dt_meta table is right way. From an admins perspective, an ask for creation of meta table looks wrong to me. dt_meta table is created for the purpose of exactly once but it does not hold any user data.. On this logic admin might deny developers for creation of table. I suggest to start a separate thread to do exactly once for JDBC insert in a cleaner way. We take take a look at Kafka or File outputs to see how they've done to achieve exactly once without creating a meta location at destination. -Chinmay. On Mon, Jan 16, 2017 at 11:16 AM, Pradeep Kumbhar <prad...@datatorrent.com> wrote: > +1 on having operator documentation explicitly mentioning that, "dt_meta" > table is mandatory > for the operator to work correctly. Also provide a sample table creation > query for reference. > > On Sat, Jan 14, 2017 at 1:05 PM, AJAY GUPTA <ajaygit...@gmail.com> wrote: > > > Since the query can be different for different databases, the user will > > have to provide query to the operator. Rather than this, I believe it's > > easier for user to directly execute create table query on DB. > > > > Also, the create table script won't be that heavy that we create script > for > > it. Probably adding a generic type of query in the docs itself should > > suffice. > > > > > > Ajay > > > > On Sat, 14 Jan 2017 at 10:27 AM, Yogi Devendra <yogideven...@apache.org> > > wrote: > > > > > As Aniruddha pointed out, table creation should be done by dbadmin. > > > > > > In that case, utility script will be helpful. > > > > > > > > > > > > If we embed this code inside operator or application; then it will be > > > > > > difficult for dbadmin to use it. > > > > > > > > > > > > ~ Yogi > > > > > > > > > > > > On 14 January 2017 at 03:43, Thomas Weise <t...@apache.org> wrote: > > > > > > > > > > > > > -1 for automatic schema modification, unless the user asked for it. > See > > > > > > > comment on JIRA. > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 13, 2017 at 5:11 AM, Aniruddha Thombare < > > > > > > > anirud...@datatorrent.com> wrote: > > > > > > > > > > > > > > > The tables should be created / altered by dbadmin. > > > > > > > > We shouldn't worry about table creations as its one-time activity. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > A > > > > > > > > > > > > > > > > > > > > > > > > _____________________________________ > > > > > > > > Sent with difficulty, I mean handheld ;) > > > > > > > > > > > > > > > > On 13 Jan 2017 6:37 pm, "Yogi Devendra" <yogideven...@apache.org> > > > wrote: > > > > > > > > > > > > > > > > I am not very keen on having utility script. > > > > > > > > But, "no side-effects without explicit ask by the end-user" is > > > important. > > > > > > > > > > > > > > > > ~ Yogi > > > > > > > > > > > > > > > > On 13 January 2017 at 16:44, Priyanka Gugale <pri...@apache.org> > > > wrote: > > > > > > > > > > > > > > > > > IMO it's okay to create table in java code. We should document it > > in > > > > > > > > > operator guide as well as put a log message when we create table. > > > > > > > > > And in case you don't have privileges, the operator should throw > > > > > > > > meaningful > > > > > > > > > message. > > > > > > > > > > > > > > > > > > -Priyanka > > > > > > > > > > > > > > > > > > On Fri, Jan 13, 2017 at 4:07 PM, Yogi Devendra < > > > > > > > yogideven...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > My suggestions: > > > > > > > > > > > > > > > > > > > > 1. Have a separate utility script for creating this table. > > > > > > > > > > 2. Have README for the utility script > > > > > > > > > > 3. Mention about the utility script in the operator > javadocs. > > > > > > > > > > 4. Mention about the utility script in the application > > README. > > > > > > > > > > 5. If at all, you wish to ease out the process; you can > > > introduce > > > > > > > > flag > > > > > > > > > > like autoPopulateMetaTable. But. default value of this flag > > > should > > > > > > > > to > > > > > > > > > be > > > > > > > > > > off. > > > > > > > > > > 6. I would prefer to avoid side-effects unless explicitly > > asked > > > by > > > > > > > > the > > > > > > > > > > end user. > > > > > > > > > > 7. Relevant exceptions should be caught and should have a > > > message > > > > > > > > > which > > > > > > > > > > can be understood by the end user. > > > > > > > > > > > > > > > > > > > > ~ Yogi > > > > > > > > > > > > > > > > > > > > On 13 January 2017 at 15:57, Hitesh Kapoor < > > hit...@datatorrent.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > Currently to use JdbcPOJOInsertOutputOperator, user needs to > > > create > > > > > > > > > > > "dt_meta" table to enforce > > > > > > > > > > > exactly-once processing semantic. If the user fails to create > > > this > > > > > > > > > table > > > > > > > > > > > before launching the application an exception is thrown. > > > > > > > > > > > To handle this scenario we can automate the process of > creating > > > > > > > this > > > > > > > > > > table, > > > > > > > > > > > assuming the user has the appropriate privileges. The problem > > > with > > > > > > > > this > > > > > > > > > > > approach is that it may not be a very good idea to modify > > user's > > > > > > > > > database > > > > > > > > > > > automatically , also if the user doesn't has the appropriate > > > > > > > > privileges > > > > > > > > > > it > > > > > > > > > > > will eventually throw an exception (however a different > > > exception). > > > > > > > > > > > So I need your opinion if we should automate the creation of > > this > > > > > > > > > > internal > > > > > > > > > > > table (if it doesn't exists) or continue with the existing > > > > > > > behaviour > > > > > > > > or > > > > > > > > > > > anything else. > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > Hitesh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > *regards,* > *~pradeep* >