[ https://issues.apache.org/jira/browse/SPARK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-20794. ------------------------------- Resolution: Invalid It's a question, so belongs on the mailing list. I think it's a DASHDB question. show is just picking from the first partition of the underlying data source. > Spark show() command on dataset does not retrieve consistent rows from DASHDB > data source > ----------------------------------------------------------------------------------------- > > Key: SPARK-20794 > URL: https://issues.apache.org/jira/browse/SPARK-20794 > Project: Spark > Issue Type: Question > Components: Spark Core > Affects Versions: 2.0.0 > Reporter: Sahana HA > Priority: Minor > > When the user creates the dataframe from DASHDB data source (which is a > relational database) and executes df.show(5) it returns different result sets > or rows during each execution. We are aware that show(5) will pick the first > 5 rows from existing partition and hence it is not guaranteed to be > consistent across each execution. > However when we try the same show(5) command against S3 storage or > bluemixobject store (non-relational data source) we always get the same > result sets or rows in order, across each execution. > We just wanted to confirm why the difference between DASHDB and other data > source like S3/Bluemixobjectstore ? Is the issue with spark or DASHDB alone ? > or is the inconsistent rows behavior is there for all relational data source ? > Repro snippet: > -- Load the data from dashdb > val dashdb = > sqlContext.read.format("packageName").options(dashdbreadOptions).load > -- execution #1 > dashdb.show(5) > +--------------------+------------+-----------------+-------+-----+-------------+------+---+--------------+------------+ > | PRODUCT_LINE|PRODUCT_TYPE|CUST_ORDER_NUMBER| CITY|STATE| > COUNTRY|GENDER|AGE|MARITAL_STATUS| PROFESSION| > +--------------------+------------+-----------------+-------+-----+-------------+------+---+--------------+------------+ > |Personal Accessories| Eyewear| 107861|Rutland| VT|United > States| F| 39| Married| Sales| > | Camping Equipment| Lanterns| 189003| Sydney| NSW| > Australia| F| 20| Single| Hospitality| > | Camping Equipment|Cooking Gear| 107863| Sydney| NSW| > Australia| F| 20| Single| Hospitality| > |Personal Accessories| Eyewear| 189005|Villach| NA| > Austria| F| 37| Married|Professional| > |Personal Accessories| Eyewear| 107865|Villach| NA| > Austria| F| 37| Married|Professional| > +--------------------+------------+-----------------+-------+-----+-------------+------+---+--------------+------------+ > only showing top 5 rows > -- execution #2 > dashdb.show(5) > +--------------------+------------+-----------------+------------+-----+--------------+------+---+--------------+-----------+ > | PRODUCT_LINE|PRODUCT_TYPE|CUST_ORDER_NUMBER| CITY|STATE| > COUNTRY|GENDER|AGE|MARITAL_STATUS| PROFESSION| > +--------------------+------------+-----------------+------------+-----+--------------+------+---+--------------+-----------+ > |Mountaineering Eq...| Tools| 112835| Portsmouth| > NA|United Kingdom| M| 24| Single| Other| > | Camping Equipment|Cooking Gear| 193902|Jacksonville| FL| > United States| F| 22| Single|Hospitality| > | Camping Equipment| Packs| 112837|Jacksonville| FL| > United States| F| 22| Single|Hospitality| > |Mountaineering Eq...| Rope| 193904|Jacksonville| FL| > United States| F| 31| Married| Other| > | Golf Equipment| Putters| 112839|Jacksonville| FL| > United States| F| 31| Married| Other| > +--------------------+------------+-----------------+------------+-----+--------------+------+---+--------------+-----------+ > only showing top 5 rows -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org