Hi Pedro,

Thanks for working your way through your GCS implementation of the
bitstore. Contributions here are very welcome, especially to make DSpace
able to fit natively into the cloud. The cleanup/delete logic is pretty
dense, and I don't have any specific advice until I can dive back into that
chunk of code. I was pretty sure that checksum and deletes worked when that
code was checked in. I would also add, that if anyone is able to cleanup
the cleanup process, then feel free.

I'm not sure which branch is best to target. It would probably make sense
to make a pull request against 6.x, however, I'm not sure what stage the
branches are in, such that new features can't go into 6.x, but would have
to be targeted against 7.x. I suppose there could be a case to be made that
GCS implementation isn't a new feature, but just filling in a missing piece
to an existing 6.x feature. In which case you'd have to have a PR to fit it
into 6.x and 7.x.

I do know that an alternative to a delete would be to migrate the
assetstore. bitstore-migrate -a 0 -b 1. Would move/copy all the assets that
DSpace still knows about from bitstore[0] to bitstore[1]. You could then
delete the bucket/folder for bitstore[0], not graceful.

________________
Peter Dietz
Longsight
www.longsight.com
[email protected]
p: 740-599-5005 x809

On Tue, Jan 17, 2017 at 1:58 PM, Pedro Amorim <[email protected]> wrote:

> Hello Peter,
>
> First of all, I'm sorry for not replying sooner, I had some rather long
> (and deserved) vacations and also because I didn't want to get back to this
> before starting to implement it.
> Secondly, thank you so much for replying to this as your input as helped
> me a lot in my work.
> Now, I've had many difficulties trying to implement this partly due to
> being very inexperienced in java, never worked with GCS client library
> before nor have I ever programmed in the DSpace core.
>
> I believe I've made some good progress, I can already write and read files
> from a bucket in Google Cloud Storage. Everything works as expected in the
> application and everything is surprisingly fast and consistent.
> However, I'm struggling to delete objects from the bucket and the main
> problem is that the remove() (in my GCSBitStoreService.java class) function
> is never called. Every other method is called and working as expected (put,
> get and about) and I also implemented the updateTime, hash and Etag. Note
> that I can remove the object in the application, it gets removed from the
> record, from the database and the index, just not from the bitstore (Google
> Cloud Storage).
>
> I was expecting the object to remain in the bucket until cleanup is run -
> either as cron or manually, because when you delete an object from the
> application it actually doesn't remove it from the storage.
> After that, I run *cleanup -v* and everything runs smoothly, no error no
> exceptions nothing, but the object still remains in the bucket. It can't be
> related to bucket permissions because the application never actually runs
> the remove() method (which contains the Storage call).
>
> After that, I started digging in the cleanup method:
> https://github.com/DSpace/DSpace/blob/master/dspace-api/
> src/main/java/org/dspace/storage/bitstore/BitstreamStorageServiceImpl.
> java#L220
>
> Noticed the 'recent' validation and made sure the object was deleted more
> than an hour ago. Still no success. Even commented that out and nothing.
>
> Looking at the cleanup code, I can't really tell where the object is
> supposed to be removed.
> It seems to me it only calls the remove() method if versioning is enabled
> and more than one object if found:
> https://github.com/DSpace/DSpace/blob/master/dspace-api/
> src/main/java/org/dspace/storage/bitstore/BitstreamStorageServiceImpl.
> java#L289
> Any insight on this?
>
> Also, one last question:
> If I am to contribute my implementation to the DSpace project how should I
> go about it? Make a pull request in dspace6.0 branch on github?
>
> Sorry for the wall of text.
>
> Thank you very much,
>
> Pedro Amorim
>
>
> segunda-feira, 19 de Dezembro de 2016 às 13:50:49 UTC-1, Peter Dietz
> escreveu:
>>
>> Hi Pedro,
>>
>> Yes, these steps look like what you have to do to add a new
>> storage/BitStore implementation. And yes, add the Google SDK
>> to dspace-api/pom.xml instead of dspace/pom.xml.
>>
>> One implementation pointer I'd like to suggest is that it would be nice
>> to support the use case of simple/small objects ~5MB, and also multipart
>> upload/download so that it can work with large 5GB - 5TB sized objects.
>> https://cloud.google.com/storage/docs/json_api/v1/how-tos/
>> multipart-upload
>>
>> Also, be mindful as to what you want to use to track the checksums.
>> https://cloud.google.com/storage/docs/hashes-etags
>>
>> Lastly, ensure that you close any resources opened when putting or
>> getting objects to the store. You can run out of resources if you open an
>> HTTP connection each time, but never close it.
>>
>> Good luck.
>>
>>
>> ________________
>> Peter Dietz
>> Longsight
>> www.longsight.com
>> [email protected]
>> p: 740-599-5005 x809 <(740)%20599-5005>
>>
>> On Mon, Dec 19, 2016 at 7:52 AM, Pedro Amorim <[email protected]> wrote:
>>
>>> Just noticed, the GCS Java client in 1) should be added to
>>> [dspace-source]/dspace-api/pom.xml instead
>>> of [dspace-source]/dspace/pom.xml as originally mentioned.
>>>
>>> segunda-feira, 19 de Dezembro de 2016 às 11:46:10 UTC-1, Pedro Amorim
>>> escreveu:
>>>>
>>>> Hello everyone,
>>>>
>>>> I'd like to store DSpace bitstreams in a GCS bucket, much like the way
>>>> the S3BitStore is implemented.
>>>> Before I start tinkering and testing, I'd much appreciate some input on
>>>> the matter from more experienced Java programmers, namely DSpace
>>>> programmers.
>>>>
>>>> So, from what I've gathered so far, I need to:
>>>>
>>>> 1) Add Google Cloud Storage Java libraries in
>>>> [dspace-source]/dspace/pom.xml and perform a rebuild. This will be needed
>>>> for step 2). These libraries are as provided here:
>>>> https://cloud.google.com/storage/docs/reference/libraries#
>>>> client-libraries-install-java
>>>>
>>>> 2) Create and implement a new GCSBitStoreService.java based on the one
>>>> created for S3:
>>>> https://github.com/DSpace/DSpace/blob/master/dspace-api/src/
>>>> main/java/org/dspace/storage/bitstore/S3BitStoreService.java
>>>>
>>>> 3) Add new BitStore in bitstore.xml:
>>>> https://github.com/DSpace/DSpace/blob/master/dspace/config/
>>>> spring/api/bitstore.xml
>>>>
>>>> 4) Activate the new BitStore in bitstore.xml as documented here:
>>>> https://wiki.duraspace.org/display/DSDOC6x/Storage+Layer#Sto
>>>> rageLayer-ConfiguringAmazonS3Storage
>>>>
>>>> And that's about it?
>>>>
>>>> Thanks as always,
>>>>
>>>> Pedro Amorim
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "DSpace Technical Support" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to