Hi,
In one of our projects at PCG we have added an option to this script which
allows you to set a date from which the thumbnails will be generated. The
date compares item last modified value.
If you are interested, we can prepare a PR with this solution.

Best regards
Damian Józefowski

śr., 12 cze 2024, 17:24 użytkownik DSpace Community <
[email protected]> napisał:

> Hi Sai & Daan,
>
> The filter-media script always loops through all objects to *determine*
> which ones need to be processed.  This script is in charge of *not only*
> thumbnails, but also for extracting text for indexing purposes (and any
> other actions that are enabled as "filter.plugins" in your dspace.cfg).
> See the full docs at
> https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content
>
> So, this script doesn't keep a list of objects which have already had
> generated thumbnails.  The reason is that, even if a file has a generated
> thumbnail, it's possible the file needs to be processed by other filters
> (e.g. for full text indexing the textual content may be extracted).  So,
> every time you run "filter-media" it will loop through every file...but
> will skip any files that it notices were already processed (e.g. if the
> file already has a thumbnail or extracted text, it will not re-generate it
> unless you use the "-f" flag to force regeneration).
>
> The "skip mode" (-s flag) concept can also be used to tell it to skip
> entire communities/collections/items...but then it will never process that
> object again until it is removed from the skip list.  So, this should be
> used sparingly unless you are sure the object never will need a new
> thumbnail or full text indexing, etc.
>
> There are options to process files little by little (using the "-m" or
> maximum flag) or even process files community-by-community or
> collection-by-collection (using the "-i" or identifier flag) in order to
> break down a larger job into smaller chunks.
>
> This is simply how this tool works at this time.  I do agree there may be
> ways to make it more efficient.  But, we haven't had a developer volunteer
> to do such work or to redesign the current process.  If you or anyone else
> out in the community are interested in helping to improve this tool, I'm
> sure the Committers would welcome ideas.  All code in DSpace is
> built/support by volunteers and users. We don't have a centralized
> development team (i.e. I have no developers working for me).
>
> Semi-related this this, there have been past discussions about migrating
> all media filter scripts/tools into curation tasks (which would allow these
> processes to be run one-by-one as each new submission is added to DSpace,
> instead of via the current bulk processing script).  There's some older
> tickets/PRs related to that, but it has never been finished / found to be
> fully working.  See https://github.com/DSpace/DSpace/issues/6398 and
> https://github.com/DSpace/DSpace/pull/1674   (That said, I'd love to see
> this work completed at some point.)
>
> Tim
>
>
>
> On Tuesday, June 11, 2024 at 8:58:49 AM UTC-5 [email protected]
> wrote:
>
>> Hi Daan,
>>
>> Thankyou for your reply
>>
>> As you said if I have to restore an entire database and the assetstore,
>> it depends whether the thumbnail have been generated before taking the
>> backup, or if thumbnail were generated then no need to regenerate the
>> thumbnail from the scratch(I may not be correct, if any information I have
>> given is wrong please correct).
>>
>> As I wanted to know is that when I keep for generating thumbnail, why it
>> starts from scratch(but the generated thumbnail gets skipped anyways)
>>
>> I thought is there any other method where already generated thumbnails
>> does not get read and only generates the required(means which does not have
>> thumbnails)
>>
>> Regards
>> Sai Kumar S
>>
>> On Tue, 11 Jun 2024, 11:06 am Daan Lessing, <[email protected]> wrote:
>>
>>> Good morning,
>>>
>>> Just a follow-up question on this. Let's say for instance you have to
>>> restore an entire database and the assetstore, do you lose all thumbnails
>>> and will filter-media have to start building thumbnails from scratch?
>>>
>>> I have been running filter-media and it has been running for 3 weeks and
>>> not yet completed.
>>>
>>> Looking forward to your response.
>>>
>>> Kind regards,
>>> Daan
>>>
>>>
>>>
>>>
>>>
>>> [image: Mailtrack]
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&;>
>>>  Sender
>>> notified by
>>> Mailtrack
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&;>
>>>  11/06/24,
>>> 07:32:08
>>>
>>> On Tue, Jun 11, 2024 at 6:28 AM SAI KUMAR S <[email protected]>
>>> wrote:
>>>
>>>> Hi Tim
>>>>
>>>> Thank you for the information.
>>>>
>>>> The issue is that when we run the command line *./dspace filter-media*,
>>>> the thumbnail-generated files are also read, but they are skipped. This
>>>> means the process reads the files from the beginning each time, which takes
>>>> more time as the number of files increases.
>>>>
>>>> Is there any other method, such as executing a script, for generating
>>>> thumbnails more efficiently?
>>>> Regards
>>>> Sai Kumar S
>>>>
>>>> On Tuesday 11 June 2024 at 02:37:15 UTC+5:30 DSpace Community wrote:
>>>>
>>>>> Hi Sai,
>>>>>
>>>>> If you run "filter-media" **without** the "-f" flag, then it should
>>>>> automatically skip all Items that already have generated thumbnails.   For
>>>>> example:
>>>>>
>>>>> ./dspace filter-media
>>>>>
>>>>> When you run it **with** the "-f" flag, that tells the filter-media
>>>>> script to **regenerate all thumbnails**.
>>>>>
>>>>> For more information see the documentation on this script
>>>>> <https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content#MediafiltersforTransformingDSpaceContent-Executing(viaCommandLine)>
>>>>> .
>>>>>
>>>>> (The "skip list" is only needed if you have files which are
>>>>> consistently throwing errors and you want to *skip them from all future
>>>>> runs* of the "filter-media" script.  But, it shouldn't be necessary in 
>>>>> your
>>>>> use case.)
>>>>>
>>>>> Tim
>>>>>
>>>>> On Monday, June 10, 2024 at 5:09:33 AM UTC-5 [email protected]
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have a query regarding filter-media. I have uploaded around 1000
>>>>>> books to a collection and generated thumbnails for the PDF files using 
>>>>>> the
>>>>>> command line *dspace filter-media -f.*
>>>>>>
>>>>>> However, when I upload another 1000 files to the same collection, I
>>>>>> need to generate thumbnails only for the newly uploaded files. I tried
>>>>>> using the skip mode by creating a *skip-list.txt*, but I am not
>>>>>> getting the desired result.
>>>>>>
>>>>>> Could anyone of you provide me an example of how to correctly use the
>>>>>> skip-list.txt method to generate thumbnails?
>>>>>>
>>>>>> Alternatively, is there any other method, such as using a script
>>>>>> (e.g., Python), to generate the thumbnails for only the newly uploaded
>>>>>> files?
>>>>>>
>>>>>> Please help me solve this query.
>>>>>>
>>>>>> Thanks & Regards
>>>>>> Sai Kumar S
>>>>>>
>>>>>> --
>>>> All messages to this mailing list should adhere to the Code of Conduct:
>>>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "DSpace Community" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
> All messages to this mailing list should adhere to the Code of Conduct:
> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-community/6190fd47-32e8-4f7f-a1d5-3c9744dce5ean%40googlegroups.com
> <https://groups.google.com/d/msgid/dspace-community/6190fd47-32e8-4f7f-a1d5-3c9744dce5ean%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/CAGkqesYdBwV%3DUZxQV0%3D0XkSFr1pTPi657S-6v%3DgBjtks%2Be6bPA%40mail.gmail.com.

Reply via email to