[jira] [Commented] (PHOENIX-3506) Phoenix-Spark plug in cannot select by column family name

Jira Thu, 06 May 2021 23:40:10 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340607#comment-17340607
 ]


Mariusz Szpatuśko commented on PHOENIX-3506:
--------------------------------------------

In 5.1.1 phoenix-connectors problem seems to be resolved. Syntax for non 
default column family is: col("`A.STATUS`").alias("status"), for default is 
col("ID"). df.show() is also working wih column family name as prefix in column 
name.

> Phoenix-Spark plug in cannot select by column family name
> ---------------------------------------------------------
>
>                 Key: PHOENIX-3506
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3506
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Xindian Long
>            Priority: Major
>
> I have a table with multiple column family with possible same column names.
> I want to use phoenix-spark plug in to select some of the fields, but it 
> returns a AnalysisException (details in the attached file).
> It works with no column family, but I expect that I do not need to make sure 
> column names are unique  across different column families.
> I used the following code:
> ----
> public void testSpark(JavaSparkContext sc, String tableStr, String 
> dataSrcUrl) {
>     //SparkContextBuilder.buildSparkContext("Simple Application", "local");
>     // One JVM can only have one Spark Context now
>     Map<String, String> options = new HashMap<String, String>();
>     SQLContext sqlContext = new SQLContext(sc);
>     options.put("zkUrl", dataSrcUrl);
>     options.put("table", tableStr);
>     log.info("Phoenix DB URL: " + dataSrcUrl + " tableStr: " + tableStr);
>     DataFrame df = null;
>     try {
>         df = 
> sqlContext.read().format("org.apache.phoenix.spark").options(options).load();
>         df.explain(true);
>         df.show();
>         df = df.select("I.CI", "I.FA");
>         //df = df.select("\"I\".\"CI\"", "\"I\".\"FA\""); // This gives the 
> same exception too
>     } catch (Exception ex) {
>         log.error("sql error: ", ex);
>     }
>     try {
>         log.info("Count By phoenix spark plugin: " + df.count());
>    } catch (Exception ex) {
>         log.error("dataframe error: ", ex);
>     }
> }
>  -----
>  
> I can see in the log that there is something like
>  
> 10728 [INFO] main  org.apache.phoenix.mapreduce.PhoenixInputFormat  - Select 
> Statement: SELECT 
> "RID","I"."CI","I"."FA","I"."FPR","I"."FPT","I"."FR","I"."LAT","I"."LNG","I"."NCG","I"."NGPD","I"."VE","I"."VMJ","I"."VMR","I"."VP","I"."CSRE","I"."VIB","I"."IIICS","I"."LICSCD","I"."LEDC","I"."ARM","I"."FBM","I"."FTB","I"."NA2FR","I"."NA2PT","S"."AHDM","S"."ARTJ","S"."ATBM","S"."ATBMR","S"."ATBR","S"."ATBRR","S"."CS","S"."LAMT","S"."LTFCT","S"."LBMT","S"."LDTI","S"."LMT","S"."LMTN","S"."LMTR","S"."LPET","S"."LPORET","S"."LRMT","S"."LRMTP","S"."LRMTR","S"."LSRT","S"."LSST","S"."MHDMS0","S"."MHDMS1","S"."RFD","S"."RRN","S"."RRR","S"."TD","S"."TSM","S"."TC","S"."TPM","S"."LRMCT","S"."SS13FSK34","S"."LERMT","S"."LEMDMT","S"."AGTBRE","S"."SRM","S"."LTET","S"."TPMS","S"."TPMSM","S"."TM","S"."TMF","S"."TMFM","S"."NA2TLS","S"."NA2IT","S"."CWR","S"."BPR","S"."LR","S"."HLB","S"."NA2UFTBFR","S"."DT","S"."NA28ARE","S"."RM","S"."LMTB","S"."LRMTB","S"."RRB","P"."BADUC","P"."UAN","P"."BAPS","P"."BAS","P"."UAS","P"."BATBBR","P"."BBRI","P"."BLBR","P"."ULHT","P"."BLPST","P"."BLPT","P"."UTI","P"."UUC"
>  FROM TESTING.ENDPOINTS
>  
> But obviously, the column family is  left out of the Dataframe column name 
> somewhere in the process.
> Need a fix that can select by ColumnFamilyName.ColumnQualifier



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-3506) Phoenix-Spark plug in cannot select by column family name

Reply via email to