Also, a note regarding accepting fieldInfo objects:
FieldInfo has fields for java type of the column, but not the SQL data type
which is needed to set the parameters in the SQL statement. This still
needs to be derived from the database (as in insert).
In more complex scenarios, as mentioned earlier, this may not always be
possible. In this case we have the following options:
1. Accept the SQL data types from the user for the parameters ("?") in a
new data structure.
2. Accept the JSON structure as specified earlier.
3. Modify FieldInfo to have an additional optional field which indicates
the SQL data type of the parameter ("?")
I am planning to go ahead with option 3.
Comments?
-Bhupesh
On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <[email protected]>
wrote:
> Yeah that sounds good.
>
> Chandni
>
> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <[email protected]>
> wrote:
>
> > Thanks Chandni for the comments.
> >
> > Additionally I think, we should do away with constructing the SQL query
> in
> > the operator. This is because it is only possible in case of simple
> insert
> > statements. In case a where clause is needed in the insert statement, we
> > cannot construct the SQL easily. If we are accepting a parametrized query
> > from the user for update/merge, why not do the same for insert statements
> > as well? Then the hierarchy would look like:
> > - AbstractJdbcPOJOOutputOperator (As suggested)
> > -- JdbcPOJOOutputOperator (Takes care of all statements insert, update,
> > merge, delete)
> >
> > We don't need a separate operator for inserts and update/merge.
> >
> > Thanks.
> > -Bhupesh
> >
> >
> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <[email protected]>
> > wrote:
> >
> > > 1. FieldInfo, is ultimately a custom object and any user who uses
> this
> > > operator has to construct an object, populate it and then use it. We
> > are
> > > trying to avoid using any custom object and allow any user to use
> the
> > > operator without writing any extra code; just configuration.
> > > FieldInfo is a way to provide configuration in a UI friendly way.
> > Providing
> > > configuration as JSON is not UI friendly.
> > >
> > > 2. In case of update / merge, we need what SQL data types that the
> > > expression provided by the user would evaluate to. In case of insert
> > we
> > > can
> > > go and fetch the data types of the columns directly from DB.
> However,
> > > the
> > > same is not possible for custom expressions; "avg(salary)" for
> > instance.
> > > Ok so here is where you can make a change.
> > > - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
> > > - Abstraction is to fetch the type of column.
> > > - Add a concrete JDBCPojoInsertOutput that derives the types of
> columns
> > > directly from DB. Please note that FieldInfo can also provide type of
> > > derived column.
> > >
> > >
> > > You mentioned "We are trying to avoid using any custom object and allow
> > any
> > > user to use the
> > > operator without writing any extra code".
> > > This I think is specific to your use case. You can create an extension
> of
> > > the above which takes JSON blob and creates FieldInfos from it.
> > >
> > > Chandni
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
> [email protected]
> > >
> > > wrote:
> > >
> > > > Hi Chandni,
> > > >
> > > > Following are the issues:
> > > >
> > > > 1. FieldInfo, is ultimately a custom object and any user who uses
> > this
> > > > operator has to construct an object, populate it and then use it.
> We
> > > are
> > > > trying to avoid using any custom object and allow any user to use
> > the
> > > > operator without writing any extra code; just configuration.
> > > > 2. In case of update / merge, we need what SQL data types that the
> > > > expression provided by the user would evaluate to. In case of
> insert
> > > we
> > > > can
> > > > go and fetch the data types of the columns directly from DB.
> > However,
> > > > the
> > > > same is not possible for custom expressions; "avg(salary)" for
> > > instance.
> > > >
> > > > Thanks.
> > > >
> > > > -Bhupesh
> > > >
> > > >
> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > Hi Bhupesh,
> > > > >
> > > > > JDBCPojoOutputOperator was written for a demo and therefore it was
> > > marked
> > > > > Evolving which is why I had mentioned that you should feel free to
> > > modify
> > > > > it.
> > > > >
> > > > > I think an insert query can be as complex as any other query. It
> uses
> > > > > FieldInfo because in the app builder it is easy for the user to
> > provide
> > > > > that instead of JSON String.
> > > > >
> > > > > Can you please provide specifics about what it is that you find
> > > difficult
> > > > > to change/implement for providing update/merge/delete support in
> > > > > JDBCPojoOutputOperator?
> > > > >
> > > > > Chandni
> > > > >
> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
> > > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > It has been pointed out that adding another class for handling
> > > update /
> > > > > > merge queries would not be a good option.
> > > > > >
> > > > > > Here are the current implementation details:
> > > > > >
> > > > > > - We have an existing class: JdbcPOJOOutputOperator which
> > accepts
> > > a
> > > > > list
> > > > > > of FieldInfo objects. Each element of this list indicates the
> > > > details
> > > > > > (column name, pojo field expression and datatype) of fields
> that
> > > > need
> > > > > > to be
> > > > > > inserted. Using this, the operator formulates the insert query
> > in
> > > > the
> > > > > > setup
> > > > > > method and identifies the sql datatypes of these columns from
> > the
> > > > > > database
> > > > > > using the table name.
> > > > > >
> > > > > >
> > > > > > - Now, coming to the update / merge feature, it is difficult
> to
> > > > > > formulate the update / merge query in the operator logic due
> to
> > > the
> > > > > > complex
> > > > > > structure of these statements. For this reason, we plan to
> take
> > a
> > > > > > parametrized SQL query from the user. This may look like:
> > *"update
> > > > > table
> > > > > > set x = ?, y = ? where z + w > ? and a == 1;"*. Such
> statements
> > > can
> > > > be
> > > > > > accepted from the user in addition to a json string which
> > > indicates
> > > > > the
> > > > > > details for the parameters: *column name, the pojo expression
> > and
> > > > the
> > > > > > sql data type* of the expression. Note that this information
> is
> > > > > similar
> > > > > > to the FieldInfo object, but is a string which can be
> configured
> > > > > easily
> > > > > > by
> > > > > > the user. Also note that the data type which is accepted is
> the
> > > SQL
> > > > > data
> > > > > > type. Using this info we can populate the parametrized query
> and
> > > run
> > > > > it
> > > > > > based on the incoming POJO.
> > > > > >
> > > > > > The second approach is able to handle all kinds of queries
> (insert
> > /
> > > > > update
> > > > > > / merge / delete). However, since we already have the
> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
> > functionality
> > > > into
> > > > > > the same class.
> > > > > >
> > > > > > Here we have the following options:
> > > > > >
> > > > > > 1. Change the existing class (JdbcPOJOOutputOperator) to the
> > > second
> > > > > > approach which is more generic and also handles inserts.
> > > > > > 2. Add the update/ merge functionality to the existing class
> > > without
> > > > > > changing the existing functionality. This will have two
> > different
> > > > ways
> > > > > > that
> > > > > > insert queries may be handled in the operator.
> > > > > > 3. Add another class which extends from JdbcPOJOOutputOperator
> > and
> > > > > have
> > > > > > the update/merge functionality there. (This is not
> recommended.)
> > > > > > 4. Any other approach.
> > > > > >
> > > > > > Please suggest.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > -Bhupesh.
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
> > > > [email protected]
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > No I don't think we are restricting Malhar to just abstract
> > > classes.
> > > > > > > Whenever they are couple of use cases that we see quite often,
> we
> > > add
> > > > > > > concrete implementations.
> > > > > > >
> > > > > > > For eg. FileLineInputOperator which is a concrete
> implementation
> > of
> > > > > > > AbstractFileInputOperator. FSSliceReader is an example as well.
> > > > > > >
> > > > > > > In AbstractJdbcOutputOperator case there hasn't been such
> common
> > > > > > > insert/update query.
> > > > > > >
> > > > > > > Also if you look at the example I provided, it is very simple
> to
> > > > > provide
> > > > > > a
> > > > > > > concrete implementation.
> > > > > > >
> > > > > > > If you would like to change JdbcPOJOOutputOperator to work for
> > > > > > > "UPDATE/MERGE" then please go ahead.
> > > > > > >
> > > > > > > Chandni
> > > > > > >
> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
> > > > > [email protected]
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I see. So, just to understand more, do we plan to keep Malhar
> > > > > > restricted
> > > > > > > to
> > > > > > > > the base functionality (as in abstract classes)? And put the
> > > > > > > configuration
> > > > > > > > aspect / concrete implementations in apps that use these
> > > operators?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > > Bhupesh
> > > > > > > >
> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Here is an example of doing Upsert with JDBC:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Chandni
> > > > > > > > >
> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
> > > > > > > > > >
> > > > > > > > > > Here is one of them:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > > > > > > > >
> > > > > > > > > > They work with any kind PreparedStatement - insert or
> > update
> > > > > > > > > >
> > > > > > > > > > Chandni
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > > > > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi Chandni,
> > > > > > > > > >>
> > > > > > > > > >> I don't see an update query being handled in the
> operator.
> > > > Could
> > > > > > you
> > > > > > > > > >> please
> > > > > > > > > >> point me to the appropriate class?
> > > > > > > > > >> Or did you mean that handling a update query is just a
> > > matter
> > > > of
> > > > > > > > > extending
> > > > > > > > > >> the class and providing a concrete implementation?
> > > > > > > > > >>
> > > > > > > > > >> Thanks.
> > > > > > > > > >> -Bhupesh
> > > > > > > > > >>
> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > > > > > > > > [email protected]>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi Bhupesh,
> > > > > > > > > >> >
> > > > > > > > > >> > The current abstract JDBC Output Operators in library
> > are
> > > > > > generic
> > > > > > > > and
> > > > > > > > > >> have
> > > > > > > > > >> > already been used in multiple POCs and applications.
> In
> > > fact
> > > > > > this
> > > > > > > > > >> operator
> > > > > > > > > >> > has matured through customer use cases. It is not just
> > an
> > > > > insert
> > > > > > > > > >> operator.
> > > > > > > > > >> > We have used it to perform update and inserts.
> > > > > > > > > >> >
> > > > > > > > > >> > That said, I don't think it is a good idea to
> introduce
> > > > input
> > > > > > > format
> > > > > > > > > in
> > > > > > > > > >> > these abstract implementations. It is written to
> handle
> > > any
> > > > > > type
> > > > > > > of
> > > > > > > > > >> query,
> > > > > > > > > >> > be it a procedure call (that was an actual customer
> use
> > > > case).
> > > > > > > > > >> >
> > > > > > > > > >> > Chandni
> > > > > > > > > >> >
> > > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > > > > > > > > >> [email protected]>
> > > > > > > > > >> > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Hi All,
> > > > > > > > > >> > >
> > > > > > > > > >> > > We are planning to proceed with the following
> approach
> > > for
> > > > > > JDBC
> > > > > > > > > >> *update*
> > > > > > > > > >> > > operator:
> > > > > > > > > >> > >
> > > > > > > > > >> > > - *Update Query Configuration*
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > - Example Update Query: *update tableName set a =
> > ?**
> > > > > > where b
> > > > > > > > = ?
> > > > > > > > > >> and
> > > > > > > > > >> > c
> > > > > > > > > >> > > > ?;*
> > > > > > > > > >> > > - Example JSON input array for parameter
> > > > instantiations:
> > > > > > > *[{a,
> > > > > > > > > >> > > expression, INTEGER}, {b, expression, VARCHAR},
> {c,
> > > > > > > expression,
> > > > > > > > > >> > DATE}]*
> > > > > > > > > >> > >
> > > > > > > > > >> > > We are also planning to change the JDBC Output
> > Operator
> > > in
> > > > > > > Malhar
> > > > > > > > > >> Library
> > > > > > > > > >> > > which currently does just insert. We plan to make
> the
> > > > input
> > > > > > > format
> > > > > > > > > >> > > consistent for both insert and update and hence the
> > > change
> > > > > to
> > > > > > > the
> > > > > > > > > >> current
> > > > > > > > > >> > > way of configuration using JSON. Following would be
> > the
> > > > > config
> > > > > > > for
> > > > > > > > > >> > inserts:
> > > > > > > > > >> > >
> > > > > > > > > >> > > - *Insert Query Configuration*
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > - Example Insert Query: *insert into tableName
> > values
> > > > (?,
> > > > > > ?,
> > > > > > > > .. ,
> > > > > > > > > >> ?);*
> > > > > > > > > >> > > - Example JSON input array for parameter
> > > > instantiations:
> > > > > > > *[{a,
> > > > > > > > > >> > > expression, INTEGER}, {b, expression, VARCHAR},
> ..
> > ,
> > > > {c,
> > > > > > > > > >> expression,
> > > > > > > > > >> > > DATE}]*
> > > > > > > > > >> > >
> > > > > > > > > >> > > Please let us know your thoughts.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Thanks.
> > > > > > > > > >> > >
> > > > > > > > > >> > > -Bhupesh
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > > > > > > > > >> [email protected]>
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Hi All,
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Would it be a good idea to introduce the update
> > > > > > functionality
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > >> > JDBC
> > > > > > > > > >> > > > output operator in Apache Apex Malhar library.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > The following are possible approaches:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > 1. Accept a update query from the user with
> place
> > > > > holders
> > > > > > > for
> > > > > > > > > >> > values.
> > > > > > > > > >> > > > Example: *update tableName set a = ?, b = ?
> > where c
> > > > = ?
> > > > > > and
> > > > > > > > d >
> > > > > > > > > >> ?*.
> > > > > > > > > >> > > > Here "?" will be provided by the user as java
> > > > > expressions
> > > > > > > > which
> > > > > > > > > >> will
> > > > > > > > > >> > > be
> > > > > > > > > >> > > > evaluated from the incoming tuple.
> > > > > > > > > >> > > > 2. Another option is to accept in some
> > > configuration
> > > > > > format
> > > > > > > > > >> (json /
> > > > > > > > > >> > > > xml) the following and formulate the query in
> the
> > > > > > operator.
> > > > > > > > > This
> > > > > > > > > >> can
> > > > > > > > > >> > > become
> > > > > > > > > >> > > > arbitrarily complex.
> > > > > > > > > >> > > > 1. update clause columns
> > > > > > > > > >> > > > 2. update clause expressions
> > > > > > > > > >> > > > 3. where clause columns
> > > > > > > > > >> > > > 4. where clause expressions
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > I am thinking about going ahead with 1. Please let
> > me
> > > > know
> > > > > > if
> > > > > > > > any
> > > > > > > > > >> other
> > > > > > > > > >> > > > option is possible and whether such a
> functionality
> > > > > already
> > > > > > > > exists
> > > > > > > > > >> in
> > > > > > > > > >> > > some
> > > > > > > > > >> > > > other class.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Thanks.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > -Bhupesh
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>