Hello back,

I've worked some more on the dreaded DERBY-2469 RFE, and I'm here to share my discoveries and find out some key questions which will make me understand if this is doable for real or not.

Discoveries and thoughts

1) It's very hard to test JNLP applications, and so it's very hard to test my JNLP StorageFactory. This is because the jnlp services such as PersistenceServiceImpl and BasicServiceImpl get only loaded at JVM startup time when java is launched from the javaws executable, which needs a jnlp file. This makes testing jnlp-based classes very hard, since automated testing or debugging with an IDE is out of discussion.

I've tried manually including my Mac OS X Java6 javaws.jar (which contains the system-dependent implementation classes for the JNLP interfaces) and manually initializing and loading the services, but it all fails horribly because of several missing things (mostly missing system properties, which impl classes expect, and are probably set up by javaws or some other yet-to-be-found class).

It can _PROBABLY_ be done by tinkering with it and slowly adding everything needed by going exception and after exception manually putting in the tidbits impl classes expect, but it's for sure a long and tedious work, and it would strongly platform-dependent (for example, I've noticed the impl classes in javaws varies consistently on the various platforms, and so do JNLP expected system properties). Or, maybe, if we are REALLY lucky, we can find a JNLPInitializerWhatever class, with a 'startmeup' method for setting everything. Dreaming is not a crime you know? :P

Any help here is SERIOUSLY appreciated. Yes I know I should post on the java ML, I'm still figuring out WHERE exactly is the right place to post this, again, any help here is appreciated.

2) PersistenceService has one big limitation which wasn't apparent reading the javadoc, but that I discovered after some heavy testing with it.

You can only create storage entities on the same codebase your application is, or parent codebases. If, for example, your application codebase is http://db.apache.org/derby, it is legal to create persistent entities as http://db.apache.org/derby/FILE or http://db.apache.org/FILE (this is done to allow sharing between application from the same vendor/domain), but IT IS NOT LEGAL to create http://db.apache.org/derby/MYSUBDIR/FILE, as of course it is not legal to create http://www.otherdomain.com/FILE.

This effectively means I won't be able to use the JNLP URL as the file path as I initially planned (because I can't create a hierarchy, I can also save entities at the root level), but I will have to use a flat hierarchy for saving entities and transparently translate this into a full-fledged hierarchy.

I plan to do this by using a 'name' hierarchy such as http:// db.apache.org/derby/directory,subdir1,subdir2,filename. I've checked valid separator chars for URLs and the best choice seems to be ',' since it's more easily readable than for example '|', but it's still VERY VERY rare to find in common files (I seriously hope derby doesn't create my,file,name files).

At start I won't add any support data structure, but do everything with this flat structure, and parse the entire list each time to find for example the files available in a given directory (scan the entire list, find the ones that begins for "codebase/dir,"). Later, if performance become an issue, I could easily throw in a Tree for fast seek/list operations, generated at startup time, and kept up-to-date with each operation, but I prefer to start with a simpler approach and deal with performance problems later, if they come out.

Moreover, I don't really think derby creates THAT many files that even a full list scan for a simple list operation would make the StorageFactory so damn slow. But you know: first make it work, then make it fast.

3) PersistenceServiceImpl, which is Sun standard implementation of PersistenceService interface, only allows for 255 storage entities for a given codebase. After some decompiling (thanks god JAD exists, since downloading JVM sources from Sun site it's presently a nightmare) I found out it keeps an internal array, size 255. Very very dumb I know. If you try to register a 256th entity, it fails horribly with a ArrayIndexOutOfBoundsException.

This can't be overcome easily, since swapping that implementation with a more capable one would be a nightmare of finding out how to replicate the 'sandbox friendly' persistence of the impl class, and anyway, we don't really want to substitute Sun implementaion, but the main focus here is to make derby work with the tools sun gives us, instead of working around them.

Another approach to surpass this limit would be to save SEVERAL 'files' into a single storage entity, to cut down the number of these (especially since there are no size limitations, but the implementation just warns/ask the user when more than the default maximum size is requested) but this would complicate the StorageFactory/StorageFile implementations by several order of magnitude, since I would have to keep 'pointers' to the file start and end inside each persistent entity, etc... this isn't easy and this isn't something I want to do unless there is no other viable way.

Also, maybe opening a bug regarding this limitation would make Sun improve the default implementation with a proper one (hey, use an ArrayList man!). I think this could be easily done and would have no downsides, especially since we already have size limits, so there's no use of a 'number of files' limit.

4) Luckily, there are no size limitations, both for single entity and the whole storage. There is just a default maximum size threshold, and the user gets a request popup when the storage gets past it.

Ok, now with the questions/doubts

A) StorageFactory.shutdown() - should a proper implementation delete the 'temporary' files if they are made persistent? I'm current planning to implement temp files as standard storage entities, just 'tagged' with PersistenceService.TEMPORARY, but since this tag does nothing automatically, I will have to manually delete temporary entities at startup/shutdown I think. Another approach would be to have the temp files in memory, and do not keep them in the persistence storage: this would save size/number of files and would make this unnecessary

B) StorageFactory.init() - Let me see If i Got this right.

Home is directory of derby home (where all database are stored), but can be ignored.
DatabaseName is subdir in home for the given database.
TempDirName is home for temp files (can temporary files be created also outside of it, like in database directory? Is this possible?)
UniqueName is database specific subdirectory inside tempDirName.

Home shouldn't be created but provided (though in my case I have to create it since the user can't). DatabaseName can be null when you are using the StorageFactory just to access the database directories (this will happen with my factory as well?), but if its not I have to create home/databaseName. TempDirName can be null and a default should be used, and the directory created, but ONLY if uniqueName is not null. If uniqueName is null, then no temp dir is available and from what I can guess it means Derby won't use ANY TEMPORARY FILE AT ALL. Right?

Also, please use better names next time. IMHO home, databaseHome, temp, databaseTemp would have been a wiser choice, more easily understood by non-derbiers. Anyway.

C) StorageFactory.newStorageFile(String path) - should this method also create a temporary file, if the given path is under the temp dir? This isn't clear IMHO: if they hand me a tempDir path, do I have to just wrap it in a StorageFile, or Do I also have to create a temporary file with a unique name and return it? I Really haven't clear all the newStorageFile/createTemporaryFile methods, and even reading the javadocs and looking at BaseStorageFactory, I still feel puzzled.

Can someone help me understand this? I'd love that :)

D) This is hard to explain, but I will try my best. Since the PersistenceService API only gives me a 'name' metadata for a given storage entity, and I prefer not to save metadata inside the file contents itself (this would make things harder to implement), I was asking myself what metadata regarding the file do I have to keep.

Surely, first of all, I would need to tell if a storage entity it's temporary or not. If temp files gets created ONLY under the tempDir, this means I could use the name/path itself to tell this. But I'm still wondering if derby creates temporary files also outside tempDir or not. I think and hope not.

Also, I surely have to tell if a given storage entity it's a directory or a file. I could tell them by the name/path as well (if they end with SEPARATOR, they are dirs, if they don't, they are files), but this would work ONLY if derby doesn't craft itself alone paths for directories/files by parsing the path instead of using the StorageFactory/StorageFile methods. If derby does, then derby could produce a directory URL without the trailing separator, which will mess things up.

Another approach would be to use the use a zero maximum length as directory tag. A directory would have a zero length (and this is good also, since I'll create directories storage entities just as a placeholder, to tell if a directory was created or not), a file would have a default 1024/whatever maximum length at start (remember that this can be easily grow when needed, when I write to the entity).

Again, there is also a readonly bit on the files. Do I have to persist this as well? It would be a problem if I 'lose' this information and don't persist this? Also: are there any other file metadata bits which I'm forgetting and that I should save in the persistent storage?

Thanks again for your help, If I get some answers to shed some lights onto my doubts, I will fix a couple of things in the next few hours and post an 'alpha' JNLPStorageFactory/StorageFile patch in the jira issue.

Also, since working on it I've found out there are SEVERAL similarities between JNLP storage and an hypothetical memory storage using a simple Map<Path, File>, I'm implementing things around an abstract base class which delegates to the extending implementation classes only a couple of CRUD methods (create/delete/rename/etc...) and build all the StorageFactory/StorageFile logic on top of these.

This will mean I can probably get done a working MemoryStorageFactory along with the JNLP one, since doing the former would be only a 5% work more than doing my own JNLP storage, as I'm currently planning to do things.

Also, I think this could potentially lead to a massive StorageFactory/ StorageFile redesign for easier storage implementations, or to some higher-level abstract class wrappers around the present StorageFactory/StorageFile interfaces, such as the one I'm doing, that depends only on a FEW storage methods, instead of the 30+ methods one presently needs to implement to get a Storage working. Of course, this is something I can't really tell if this is something needed and good for real or not, given my very basic derby internal knowledge.

Thanks again for any help/critic/hint/whatever you may provide, and forgive me for my messy english, and my very direct and yet very verbose way of writing :P

Luigi

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to