Hi Damian Józefowski,

Thankyou 

Can you please share that information which could help us on getting the 
solution to the issue.

Regards
Sai Kumar S

On Thursday 13 June 2024 at 18:32:38 UTC+5:30 Damian Józefowski wrote:

> Hi, 
> In one of our projects at PCG we have added an option to this script which 
> allows you to set a date from which the thumbnails will be generated. The 
> date compares item last modified value.
> If you are interested, we can prepare a PR with this solution.
>
> Best regards
> Damian Józefowski
>
> śr., 12 cze 2024, 17:24 użytkownik DSpace Community <
> [email protected]> napisał:
>
>> Hi Sai & Daan,
>>
>> The filter-media script always loops through all objects to *determine* 
>> which ones need to be processed.  This script is in charge of *not only* 
>> thumbnails, but also for extracting text for indexing purposes (and any 
>> other actions that are enabled as "filter.plugins" in your dspace.cfg).  
>> See the full docs at 
>> https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content
>>
>> So, this script doesn't keep a list of objects which have already had 
>> generated thumbnails.  The reason is that, even if a file has a generated 
>> thumbnail, it's possible the file needs to be processed by other filters 
>> (e.g. for full text indexing the textual content may be extracted).  So, 
>> every time you run "filter-media" it will loop through every file...but 
>> will skip any files that it notices were already processed (e.g. if the 
>> file already has a thumbnail or extracted text, it will not re-generate it 
>> unless you use the "-f" flag to force regeneration).   
>>
>> The "skip mode" (-s flag) concept can also be used to tell it to skip 
>> entire communities/collections/items...but then it will never process that 
>> object again until it is removed from the skip list.  So, this should be 
>> used sparingly unless you are sure the object never will need a new 
>> thumbnail or full text indexing, etc.
>>
>> There are options to process files little by little (using the "-m" or 
>> maximum flag) or even process files community-by-community or 
>> collection-by-collection (using the "-i" or identifier flag) in order to 
>> break down a larger job into smaller chunks.
>>
>> This is simply how this tool works at this time.  I do agree there may be 
>> ways to make it more efficient.  But, we haven't had a developer volunteer 
>> to do such work or to redesign the current process.  If you or anyone else 
>> out in the community are interested in helping to improve this tool, I'm 
>> sure the Committers would welcome ideas.  All code in DSpace is 
>> built/support by volunteers and users. We don't have a centralized 
>> development team (i.e. I have no developers working for me).
>>
>> Semi-related this this, there have been past discussions about migrating 
>> all media filter scripts/tools into curation tasks (which would allow these 
>> processes to be run one-by-one as each new submission is added to DSpace, 
>> instead of via the current bulk processing script).  There's some older 
>> tickets/PRs related to that, but it has never been finished / found to be 
>> fully working.  See https://github.com/DSpace/DSpace/issues/6398 and 
>> https://github.com/DSpace/DSpace/pull/1674   (That said, I'd love to see 
>> this work completed at some point.)
>>
>> Tim
>>
>>
>>
>> On Tuesday, June 11, 2024 at 8:58:49 AM UTC-5 [email protected] 
>> wrote:
>>
>>> Hi Daan,
>>>
>>> Thankyou for your reply
>>>
>>> As you said if I have to restore an entire database and the assetstore, 
>>> it depends whether the thumbnail have been generated before taking the 
>>> backup, or if thumbnail were generated then no need to regenerate the 
>>> thumbnail from the scratch(I may not be correct, if any information I have 
>>> given is wrong please correct).
>>>
>>> As I wanted to know is that when I keep for generating thumbnail, why it 
>>> starts from scratch(but the generated thumbnail gets skipped anyways)
>>>
>>> I thought is there any other method where already generated thumbnails 
>>> does not get read and only generates the required(means which does not have 
>>> thumbnails)
>>>
>>> Regards 
>>> Sai Kumar S 
>>>
>>> On Tue, 11 Jun 2024, 11:06 am Daan Lessing, <[email protected]> wrote:
>>>
>>>> Good morning,
>>>>
>>>> Just a follow-up question on this. Let's say for instance you have to 
>>>> restore an entire database and the assetstore, do you lose all thumbnails 
>>>> and will filter-media have to start building thumbnails from scratch?
>>>>
>>>> I have been running filter-media and it has been running for 3 weeks 
>>>> and not yet completed. 
>>>>
>>>> Looking forward to your response.
>>>>
>>>> Kind regards,
>>>> Daan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [image: Mailtrack] 
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&;>
>>>>  Sender 
>>>> notified by 
>>>> Mailtrack 
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&;>
>>>>  11/06/24, 
>>>> 07:32:08 
>>>>
>>>> On Tue, Jun 11, 2024 at 6:28 AM SAI KUMAR S <[email protected]> 
>>>> wrote:
>>>>
>>>>> Hi Tim
>>>>>
>>>>> Thank you for the information.
>>>>>
>>>>> The issue is that when we run the command line *./dspace filter-media*, 
>>>>> the thumbnail-generated files are also read, but they are skipped. This 
>>>>> means the process reads the files from the beginning each time, which 
>>>>> takes 
>>>>> more time as the number of files increases.
>>>>>
>>>>> Is there any other method, such as executing a script, for generating 
>>>>> thumbnails more efficiently?
>>>>> Regards
>>>>> Sai Kumar S
>>>>>
>>>>> On Tuesday 11 June 2024 at 02:37:15 UTC+5:30 DSpace Community wrote:
>>>>>
>>>>>> Hi Sai,
>>>>>>
>>>>>> If you run "filter-media" **without** the "-f" flag, then it should 
>>>>>> automatically skip all Items that already have generated thumbnails.   
>>>>>> For 
>>>>>> example:
>>>>>>
>>>>>> ./dspace filter-media
>>>>>>
>>>>>> When you run it **with** the "-f" flag, that tells the filter-media 
>>>>>> script to **regenerate all thumbnails**.
>>>>>>
>>>>>> For more information see the documentation on this script 
>>>>>> <https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content#MediafiltersforTransformingDSpaceContent-Executing(viaCommandLine)>
>>>>>> .
>>>>>>
>>>>>> (The "skip list" is only needed if you have files which are 
>>>>>> consistently throwing errors and you want to *skip them from all future 
>>>>>> runs* of the "filter-media" script.  But, it shouldn't be necessary in 
>>>>>> your 
>>>>>> use case.)
>>>>>>
>>>>>> Tim
>>>>>>
>>>>>> On Monday, June 10, 2024 at 5:09:33 AM UTC-5 [email protected] 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I have a query regarding filter-media. I have uploaded around 1000 
>>>>>>> books to a collection and generated thumbnails for the PDF files using 
>>>>>>> the 
>>>>>>> command line *dspace filter-media -f.*
>>>>>>>
>>>>>>> However, when I upload another 1000 files to the same collection, I 
>>>>>>> need to generate thumbnails only for the newly uploaded files. I tried 
>>>>>>> using the skip mode by creating a *skip-list.txt*, but I am not 
>>>>>>> getting the desired result.
>>>>>>>
>>>>>>> Could anyone of you provide me an example of how to correctly use 
>>>>>>> the skip-list.txt method to generate thumbnails?
>>>>>>>
>>>>>>> Alternatively, is there any other method, such as using a script 
>>>>>>> (e.g., Python), to generate the thumbnails for only the newly uploaded 
>>>>>>> files?
>>>>>>>
>>>>>>> Please help me solve this query.
>>>>>>>
>>>>>>> Thanks & Regards
>>>>>>> Sai Kumar S
>>>>>>>
>>>>>>> -- 
>>>>> All messages to this mailing list should adhere to the Code of 
>>>>> Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "DSpace Community" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>> All messages to this mailing list should adhere to the Code of Conduct: 
>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "DSpace Community" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/dspace-community/6190fd47-32e8-4f7f-a1d5-3c9744dce5ean%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/dspace-community/6190fd47-32e8-4f7f-a1d5-3c9744dce5ean%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/ab9c083c-402b-4895-b115-5bffcefc6e52n%40googlegroups.com.

Reply via email to