Hmm, so I can query multi-tenant tables under a certain tenantId just fine. But when I create a multi-tenant view I am getting a table undefined error. This is because I am not actually connecting as the tenant (I realize not the predicate doesn't actually connect you as the tenant). It appears then that there is no way to connect as a tenant with phoenix-spark integration, and therefore we cannot view a tenant-specific table. Is this correct?
On Fri, Oct 7, 2016 at 1:28 PM, Nico Pappagianis < [email protected]> wrote: > Thanks Josh, > The above works for multi-tenant tables that I have created for testing > but I'm still running into issues with multi-tenant tables created > elsewhere. I'll try to track down the DDL used to create the tables I'm > having problems with - maybe that will illuminate some more things. > > Thanks for pointing out the use of predicate, I wasn't familiar with that! > > > > On Fri, Oct 7, 2016 at 12:14 PM, Josh Mahonin <[email protected]> wrote: > >> Hi Nico, >> >> To elaborate a little bit, there are two options for interfacing with >> Phoenix from Spark. Either using Spark's built-in JDBC integration, which >> uses the regular Phoenix JDBC driver, or by using the phoenix-spark >> integration, which leverages Phoenix's MapReduce support. If you are able >> to use the JDBC integration for your use case, then you are likely able to >> use the 'TenantID' and other connection parameters as per the JDBC >> documentation. >> >> However, if you are using the phoenix-spark integration, you can only pass >> in a zookeeper url, a table, a sequence of columns, and a Hadoop >> Configuration object. If you have a dataframe where the tenant column is >> 'TID' and want to retrieve only data for that tenant, you can use the >> DataFrame pushdown support, via something like: >> >> df.where(df("TID") == "some_tenant") >> >> Or if you're using RDDs directly: >> >> sc.phoenixTableAsRDD( >> table = "...", >> columns = Seq(...), >> predicate = Some("TID = 'FOO'"), >> zkUrl = Some("zk-server:2181") >> ) >> >> For saving, you'll need to ensure that your DataFrame schema includes the >> tenantID as the leading column, and then you can use something like: >> >> df.write >> .format("org.apache.phoenix.spark") >> .mode("overwrite") >> .option("table", "...") >> .option("zkUrl", "...") >> .save() >> >> Or for RDDs: >> >> rdd.saveToPhoenix( >> tableName = "...", >> cols = Seq("TID", ...), >> zkUrl = Some("zk-server:2181") >> ) >> >> >> Good luck, >> >> Josh >> >> >> On Fri, Oct 7, 2016 at 2:28 PM, Nico Pappagianis < >> [email protected]> wrote: >> >> > Hi Josh just saw your post. Let me take a look at that stack overflow >> post >> > you linked. Could you elaborate on how you pass TenantID manually? >> > >> > Thanks >> > >> > On Fri, Oct 7, 2016 at 11:27 AM, Nico Pappagianis < >> > [email protected]> wrote: >> > >> > > Thanks James, >> > > >> > > I am able to read from the tenant-specific view using the >> SQLContext.read >> > > function, which gives a DataFrameReader to work with. However the >> > > SQLContext class doesn't appear to have any functionality for writing. >> > > SQLContext has, so far, been the only way I've been able to read a >> tenant >> > > view. Phoenix-Spark integration doesn't appear to pass along the >> > TenantID, >> > > at least in the same manner as we would pass it when connecting to ZK >> via >> > > squirrel, or SQLContext. >> > > >> > > But maybe I am missing something. >> > > >> > > >> > > >> > > On Fri, Oct 7, 2016 at 10:14 AM, James Taylor <[email protected] >> > >> > > wrote: >> > > >> > >> Not sure if the phoenix-spark integration propagates connection >> > >> properties, >> > >> but an alternate way of specifying the tenant ID is through that >> > mechanism >> > >> (which would workaround the URL building issue). >> > >> >> > >> On Fri, Oct 7, 2016 at 10:06 AM, Nico Pappagianis < >> > >> [email protected]> wrote: >> > >> >> > >> > Hi James, >> > >> > At first I thought it was the dataframe vs rdd implementations but >> > >> looking >> > >> > closer my bet is the way spark connects to phoenix. When reading >> via >> > >> > SqlContext I pass in >> > >> > >> > >> > "url" -> "jdbc:phoenix:zkHost1, zkHost2, zkHost3:zkPort;TenantId= >> > >> > 123456789" >> > >> > and it connects as the tenant. >> > >> > >> > >> > However SqlContext does not have a write/save function. >> > >> > >> > >> > When I try to save by other means I am required to pass in a value >> for >> > >> > "zkUrl" (not "url"). "zkUrl" cannot have the "jdbc:phoenix:" >> portion >> > >> > attached (because it attaches zkPort to the end of jdbc:phoenix and >> > >> errors >> > >> > out). As such I cannot connect as the tenant. >> > >> > >> > >> > When connecting as the tenant via squirrel client I use the same >> "url" >> > >> > string above, and it works. >> > >> > >> > >> > So to me it appears to be an issue of how to connect to phoenix as >> the >> > >> > tenant via spark/phoenix-spark integration. I have not found a >> clear >> > cut >> > >> > way to do so. >> > >> > >> > >> > >> > >> > Thanks, >> > >> > -Nico >> > >> > >> > >> > >> > >> > On Fri, Oct 7, 2016 at 9:03 AM, James Taylor < >> [email protected]> >> > >> > wrote: >> > >> > >> > >> > > Hi Nico, >> > >> > > You mentioned offline that it seems to be working for data >> frames, >> > but >> > >> > not >> > >> > > RDDs. Can you elaborate on that? Have you confirmed whether the >> > >> TenantId >> > >> > > connection property is being propagated down to the Phoenix >> > connection >> > >> > > opened for the Spark integration? >> > >> > > Thanks, >> > >> > > James >> > >> > > >> > >> > > On Thu, Oct 6, 2016 at 8:36 PM, Nico Pappagianis < >> > >> > > [email protected]> wrote: >> > >> > > >> > >> > > > Does phoenix-spark integration support multitenancy? I'm >> having a >> > >> hard >> > >> > > time >> > >> > > > getting it working on my tenant-specific view. >> > >> > > > >> > >> > > > Thanks >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > > >> > >> > >
