Hi Jon, Did you make JIRA issues for the two improvements you suggested to be made?
Cheers Mike On 04/04/2017, 04:15, "Michael Pearce" <[email protected]> wrote: Hi Jon I think on issue 1 and 2 are valid improvements that can be made, add some Jira's for these. Looking at issue 1 it seems quite trivial for someone to contribute the solution for. On the question front: 1) from my understanding and using this plugin so far this is what the partion by interval allows you to control the number of partitions that would be generated, e.g. This is why the sample in document has 1 year so it only creates 2 fragments. You can also partition by an enum field here the number of partitions is the number of enums. 2) I haven't checked specifically, saying that I haven't run into an issue when trialling this myself. Obviously this is contributed by Devin he will be the definitive source here. Though I guess can always specially test these scenarios. Sent using OWA for iPhone ________________________________________ From: Jon Roberts <[email protected]> Sent: Monday, April 3, 2017 11:57:59 PM To: [email protected] Subject: Re: PXF JDBC plugin https://github.com/apache/incubator-hawq/tree/master/pxf/pxf-jdbc Issue 1: Security The example has the username and password in the connection string. LOCATION ('pxf://localhost:51200/demodb.myclass' '?PROFILE=JDBC' '&JDBC_DRIVER=com.mysql.jdbc.Driver' '&DB_URL=jdbc:mysql:// 192.168.200.6:3306/demodb&USER=root&PASS=root' ) Any chance we can get this changed to a connection profile that points to a file outside of the database? For Greenplum database and S3, the LOCATION syntax includes "config=/path/to/config_file". The config_file contains the S3 credentials. This seems like a good pattern to follow. As it is right now, anyone that can connect to the database will be able to see the username and password of the JDBC connection. Issue 2: Extra Properties Some JDBC drivers will need many additional properties beyond the URL and this requires setting it with a put to a Properties variable. An example of this is Oracle's defaultRowPrefetch property that needs to be updated from the default of 10 which is designed for OLTP to something larger like 2000 which is more ideal for data extracts. Additionally, you will need the ability to set the isolation level which is done with setTransactionIsolation on the Connection. I don't believe you can set this on the connection URL either. Many SQL Server and DB2 database still don't use snapshot isolation and use dirty reads instead. Without a dirty read here, your query will block modifications to the table with a "blocking lock". So for the configuration file will need both an extra properties variable that is delimited so you can multiple and a isolation level indicator. Questions. 1. How do you manage the maximum number of pxf instances? For example, if you partition like this: "PARTITION_BY=cdate:date&RANGE=2008-01-01:2010-01-01&INTERVAL=1:day", will you create 730 pxf instances and thus 730 concurrent queries to the source database? 2. Have you tested special characters like null, escape, carriage return, newline, and your delimiter? Maybe that is handled automatically by pxfwritable_import so this isn't an issue. Jon Roberts Principal Engineer | [email protected] | 615-426-8661 On Mon, Apr 3, 2017 at 3:18 PM, Michael Pearce <[email protected]> wrote: > I mean the readme file :) at that location. > ________________________________________ > From: Michael Pearce > Sent: Monday, April 3, 2017 9:15:48 PM > To: [email protected] > Subject: Re: PXF JDBC plugin > > On master there is the document that was added. > > https://github.com/apache/incubator-hawq/tree/master/pxf/pxf-jdbc > > Reading this I think answers some of the questions. > > > > > > Sent using OWA for iPhone > ________________________________________ > From: Vineet Goel <[email protected]> > Sent: Monday, April 3, 2017 6:33:07 PM > To: [email protected] > Subject: Re: PXF JDBC plugin > > Devin and others, > > It would be great if you can expand a bit more on the implementation or > point to a document? > > Thanks! > > > > On Mon, Apr 3, 2017 at 7:53 AM Jon Roberts <[email protected]> wrote: > > > JDBC PXF is pretty exciting. Are there details on how this works? > > > > Does PXF randomly pick a segment to connect to the JDBC source or do you > > specify a particular node to execute the query? I'm assuming it isn't a > > parallel query so you don't flood the JDBC source with x number of > "select > > *" queries but I could be wrong. > > > > How do you register the JDBC connection string with PXF? > > > > How do you manage credentials? > > > > How do you register the JDBC jar files? > > > > > > Jon Roberts > > > > On Sat, Apr 1, 2017 at 5:05 PM, Ed Espino <[email protected]> wrote: > > > > > Thank you Michael, this is a well-deserved report highlight. I have > added > > > the following to the report: > > > > > > 3. Community contribution highlight(s): > > > > > > * Leveraging the extensible PXF design, a JDBC PXF plugin > > > was contributed by Devin Jia (github id: jiadexin). This > > > contribution came from the community and not from the > > > company which originally donated HAWQ to the ASF. > > > > > > Warm regrds, > > > -=e > > > > > > On Sat, Apr 1, 2017 at 1:14 PM, Michael André Pearce < > > > [email protected]> wrote: > > > > > > > > > > > I would add note about the new JDBC PXF plugin being contributed by > > Devin > > > > Jia. (Kudos to him) > > > > > > > > It's a good show of the community contributing and growing outside of > > > just > > > > Pivotal staff (I think for incubator review this is important) and > also > > > the > > > > benefit of the extensible PXF design the project has done. > > > The information contained in this email is strictly confidential and for > the use of the addressee only, unless otherwise indicated. If you are not > the intended recipient, please do not read, copy, use or disclose to others > this message or any attachment. Please also notify the sender by replying > to this email or by telephone (+44(020 7896 0011) and then delete the email > and any copies of it. Opinions, conclusion (etc) that do not relate to the > official business of this company shall be understood as neither given nor > endorsed by it. IG is a trading name of IG Markets Limited (a company > registered in England and Wales, company number 04008957) and IG Index > Limited (a company registered in England and Wales, company number > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG > Index Limited (register number 114059) are authorised and regulated by the > Financial Conduct Authority. > The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Conduct Authority.
