xushiyan commented on issue #6808:
URL: https://github.com/apache/hudi/issues/6808#issuecomment-1310314615
@schlichtanders the issue is rooted in
```
"--conf
spark.hadoop.javax.jdo.option.ConnectionURL='jdbc:derby:memory:databaseName=metastore_db;create=true'",
# noqa
```
where `memory` is set as subsubprotocol, which means it won't persist any
data. You should leave it empty like
`jdbc:derby:databaseName=metastore_db;create=true` so it will use the default
`directory` mode which persists to file system. see
https://db.apache.org/derby/docs/10.14/ref/rrefjdbc37352.html
Another note, the embedded driver has limitation where only 1 connection can
stay open with that database, hence if you run a spark-shell with sample code
like below to perform hive-sync, you'll run into `Another instance of Derby may
have already booted the database`
https://github.com/apache/hudi/blob/6508b11d7c1c1e4cb22aac86f9977cd951a91c9b/packaging/bundle-validation/spark_hadoop_mr/write.scala
So it really depends on how you setup the unit tests. For functional tests
which usually involves some local servers and multiple processes, client driver
works better, and it's much more lightweight than postgres.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]