(DERBY-378) implementing import/export of large objects...

Suresh Thalamati Thu, 26 Oct 2006 11:55:49 -0700

Since Jira e-mail is down, posting the following directly toderby-dev, not really that important, but did not want it to be lostwhen Jira system decides to sent all the e-mail once.

Currently derby does not allow users to perform import/export usingthe system procedures if the table contains a clobs/blob data types. Ithink enhancing derby to allow import/export of clob/blob data willbe useful to the users.

Some thoughts on how to implement import/export of large objects inDerby :

Currently Derby supports import/export using delimited file format. Ithink same format can be used to perform import/export of large objectdata also. Most of the issues are common to both import and export,for the obvious reason that one should be able to import the data intoa derby database, if it was exported from Derby.

Large Objects can be written to the same export file given by the useror can be written to another(external) file and store the reference inthe main export file.


Following sections discuss issues specific to each data type:

CLOB:

o Clobs data is exported similar to the other character data types,
  except that it can be written to an external file also.

o Double delimiters are used while writing the clob data to the export
file, if the data contains the delimiters. i.e for example
if data is  'He said "it is a nice day"', it will be written to the
file as ' He said ""it is a nice day"" '.

o If the clob data is written to an external file and then thereference is stored in the main export file, double delimiters arenot required.



BLOB:

o Blob data is written to export file as it is stored in the database,it is not converted to hex or any another format. character code-setconversion is not done for the binary data.

o If the data contains the delimiters , then it will be a problem ifstored in the same file with other data types. It may not be such agood idea to attempt to interpret binary data to find the delimitersinside the data and inject double delimiters into the data. Thatleaves us with following two options :

1) Allow import/export of blob data only when they it is stored inan external file.

2) Write the blob data to the export file along with other datatypes during export, assuming the blob data does not contain anydelimiters and throw an error if on import if it finds delimitersinside the data, by interpreting it using the same code-set as theother character data.

o When the blob data is written to an external file then no need toscan data for delimiters during export/import.


Handling large objects in an external file:

Advantage of import/export of clob/blob data using an extern file that
different from the the main export file is that the data need not be

interpreted or modified if there are delimiters inside the largeobject data. Import/export of large object will perform better becauseis not scanned. Main import/export file will contain the reference towhere the lobs are located.


Large object reference File Format:

Large object location reference will contain the file name, startingoffset of the large object inside the file and the length of the file.Format will be something like : fileName.ext.StartOffSet.Length, asstring, For example pictures.dat.100.999

Reference will be stored in the main export file. Data in the mainexport file will look like the following:


2,"pictures.dat.100.999","john"
3, "pictures.dat.999.9999", "Robert"


What will be the large objects file name during export ? :

There are two possible options:

1) Let the Derby generate the file name to store lobs, by appendingsome string like "_lobdata" to the export file name given by the user.For example if user had given "employees.del" , large object file namewill be employees_lobdata.del.

Advantage of this approach might be no need to define new procedures.In this case, by default all the large object data will always writtento an external file.

Disadvantage is if the user wants the all the data for a table in onefile for some reason, for example if there are only few clobs in thetable then there is no option. And also if by chance the file alreadyexists, then export will throw an error and the user has to move theexisting file ..etc..

2) Let the user specify the file name where the blob/clob will bewritten. To support this, Derby needs to add four new import/exportprocedures, because the user need to pass the external file nameduring export and to indicate that lobs are in an external file duringimport the.

All the import/export procedures will be appended with"LOBS_IN_EXTFILE" and export procedures will have an extra argumentthat will take the file name,where the blobs are written.


New Export Procedures:

SYSCS_UTIL.SYSCS_EXPORT_TABLE_LOBS_IN_EXTFILE(..,IN LOBSFILENAMEVARCHAR(32672))SYSCS_UTIL.SYSCS_EXPORT_QUERY_LOBS_IN_EXTFILE(..,IN LOBSFILENAMEVARCHAR(32672))


New IMPORT Procedures:

SYSCS_UTIL.SYSCS_IMPORT_DATA_LOBS_IN_EXTFILE(..)
SYSCS_UTIL.SYSCS_IMPORT_TABLE_LOBS_IN_EXTFILE(.)


In this approach old procedures will fail to import/export or

write the large object data along with other columns data. I aminclined towards the second approach, except for the fact that thereanother 4 new system procedures.


If maintaining backward compatibility for import/export procedures is

not necessary, then just new arguments are needed for the existingprocedures. But if some users are using it in some application, thenit will break. I am genereally hesitant to change the existingprocedure signatures, may be it does not matter for import/export.



To summarize:

 1) Large objects are stored along with other data or in an external
    file.

2) Binary data (Blob data type) is not modified even if the datacontains delimiters inside the data during export.

3) Import will fail gracefully or get confused and throw weirderrors, if it finds delimiters inside blobs data if stored along withother data in the import file.


 4) Four new procedure are required to allow users to specify where to
    read/write the large object data.


Any suggestions or comments ?


Thanks
-suresh

(DERBY-378) implementing import/export of large objects...

Reply via email to