jmnatzaganian edited a comment on issue #2498: URL: https://github.com/apache/hudi/issues/2498#issuecomment-974507962
I'm also having the same type of issue in EMR 6.4 after building and deploying Hudi 0.9.0. Note that as mentioned [above](https://github.com/apache/hudi/issues/2498#issuecomment-969228521), the default binaries work just fine (EMR 6.4 with Hudi 0.8.0). It seems that there's likely something off with the build or referencing. I used `mvn clean package -DskipTests -Dspark3 -Dscala-2.12 -T 30`. What's really interesting is that I can create an MoR table w/o issue, but trying to do a `load` renders the loaded DF unusable. It looks like the DF is loaded, but then becomes unusable. This [tip](https://github.com/apache/hudi/issues/2498#issuecomment-942282671) also worked for me (i.e. using `spark.sql` and referencing the table from the Glue data catalog). Unfortunately, querying the data this way seems to be *much* slower (compared to 0.8.0). I documented my build and installation process in [this](https://apache-hudi.slack.com/archives/C4D716NPQ/p1637354714476100) slack thread. Edit: I tested this with a CoW table and I did not have the issue, i.e. the following works just fine. It did; however, take 2.7x longer to do the read than it did in 0.8.0. ```` df = spark.read.format("org.apache.hudi").load(path) df.show() ```` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
