Hi Tim & Daan,

* Thank you* for your information; it was very helpful. 

Regards
Sai Kumar S

On Wednesday 12 June 2024 at 20:54:20 UTC+5:30 DSpace Community wrote:

> Hi Sai & Daan,
>
> The filter-media script always loops through all objects to *determine* 
> which ones need to be processed.  This script is in charge of *not only* 
> thumbnails, but also for extracting text for indexing purposes (and any 
> other actions that are enabled as "filter.plugins" in your dspace.cfg).  
> See the full docs at 
> https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content
>
> So, this script doesn't keep a list of objects which have already had 
> generated thumbnails.  The reason is that, even if a file has a generated 
> thumbnail, it's possible the file needs to be processed by other filters 
> (e.g. for full text indexing the textual content may be extracted).  So, 
> every time you run "filter-media" it will loop through every file...but 
> will skip any files that it notices were already processed (e.g. if the 
> file already has a thumbnail or extracted text, it will not re-generate it 
> unless you use the "-f" flag to force regeneration).   
>
> The "skip mode" (-s flag) concept can also be used to tell it to skip 
> entire communities/collections/items...but then it will never process that 
> object again until it is removed from the skip list.  So, this should be 
> used sparingly unless you are sure the object never will need a new 
> thumbnail or full text indexing, etc.
>
> There are options to process files little by little (using the "-m" or 
> maximum flag) or even process files community-by-community or 
> collection-by-collection (using the "-i" or identifier flag) in order to 
> break down a larger job into smaller chunks.
>
> This is simply how this tool works at this time.  I do agree there may be 
> ways to make it more efficient.  But, we haven't had a developer volunteer 
> to do such work or to redesign the current process.  If you or anyone else 
> out in the community are interested in helping to improve this tool, I'm 
> sure the Committers would welcome ideas.  All code in DSpace is 
> built/support by volunteers and users. We don't have a centralized 
> development team (i.e. I have no developers working for me).
>
> Semi-related this this, there have been past discussions about migrating 
> all media filter scripts/tools into curation tasks (which would allow these 
> processes to be run one-by-one as each new submission is added to DSpace, 
> instead of via the current bulk processing script).  There's some older 
> tickets/PRs related to that, but it has never been finished / found to be 
> fully working.  See https://github.com/DSpace/DSpace/issues/6398 and 
> https://github.com/DSpace/DSpace/pull/1674   (That said, I'd love to see 
> this work completed at some point.)
>
> Tim
>
>
>
> On Tuesday, June 11, 2024 at 8:58:49 AM UTC-5 [email protected] 
> wrote:
>
>> Hi Daan,
>>
>> Thankyou for your reply
>>
>> As you said if I have to restore an entire database and the assetstore, 
>> it depends whether the thumbnail have been generated before taking the 
>> backup, or if thumbnail were generated then no need to regenerate the 
>> thumbnail from the scratch(I may not be correct, if any information I have 
>> given is wrong please correct).
>>
>> As I wanted to know is that when I keep for generating thumbnail, why it 
>> starts from scratch(but the generated thumbnail gets skipped anyways)
>>
>> I thought is there any other method where already generated thumbnails 
>> does not get read and only generates the required(means which does not have 
>> thumbnails)
>>
>> Regards 
>> Sai Kumar S 
>>
>> On Tue, 11 Jun 2024, 11:06 am Daan Lessing, <[email protected]> wrote:
>>
>>> Good morning,
>>>
>>> Just a follow-up question on this. Let's say for instance you have to 
>>> restore an entire database and the assetstore, do you lose all thumbnails 
>>> and will filter-media have to start building thumbnails from scratch?
>>>
>>> I have been running filter-media and it has been running for 3 weeks and 
>>> not yet completed. 
>>>
>>> Looking forward to your response.
>>>
>>> Kind regards,
>>> Daan
>>>
>>>
>>>
>>>
>>>
>>> [image: Mailtrack] 
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&;>
>>>  Sender 
>>> notified by 
>>> Mailtrack 
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&;>
>>>  11/06/24, 
>>> 07:32:08 
>>>
>>> On Tue, Jun 11, 2024 at 6:28 AM SAI KUMAR S <[email protected]> 
>>> wrote:
>>>
>>>> Hi Tim
>>>>
>>>> Thank you for the information.
>>>>
>>>> The issue is that when we run the command line *./dspace filter-media*, 
>>>> the thumbnail-generated files are also read, but they are skipped. This 
>>>> means the process reads the files from the beginning each time, which 
>>>> takes 
>>>> more time as the number of files increases.
>>>>
>>>> Is there any other method, such as executing a script, for generating 
>>>> thumbnails more efficiently?
>>>> Regards
>>>> Sai Kumar S
>>>>
>>>> On Tuesday 11 June 2024 at 02:37:15 UTC+5:30 DSpace Community wrote:
>>>>
>>>>> Hi Sai,
>>>>>
>>>>> If you run "filter-media" **without** the "-f" flag, then it should 
>>>>> automatically skip all Items that already have generated thumbnails.   
>>>>> For 
>>>>> example:
>>>>>
>>>>> ./dspace filter-media
>>>>>
>>>>> When you run it **with** the "-f" flag, that tells the filter-media 
>>>>> script to **regenerate all thumbnails**.
>>>>>
>>>>> For more information see the documentation on this script 
>>>>> <https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content#MediafiltersforTransformingDSpaceContent-Executing(viaCommandLine)>
>>>>> .
>>>>>
>>>>> (The "skip list" is only needed if you have files which are 
>>>>> consistently throwing errors and you want to *skip them from all future 
>>>>> runs* of the "filter-media" script.  But, it shouldn't be necessary in 
>>>>> your 
>>>>> use case.)
>>>>>
>>>>> Tim
>>>>>
>>>>> On Monday, June 10, 2024 at 5:09:33 AM UTC-5 [email protected] 
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have a query regarding filter-media. I have uploaded around 1000 
>>>>>> books to a collection and generated thumbnails for the PDF files using 
>>>>>> the 
>>>>>> command line *dspace filter-media -f.*
>>>>>>
>>>>>> However, when I upload another 1000 files to the same collection, I 
>>>>>> need to generate thumbnails only for the newly uploaded files. I tried 
>>>>>> using the skip mode by creating a *skip-list.txt*, but I am not 
>>>>>> getting the desired result.
>>>>>>
>>>>>> Could anyone of you provide me an example of how to correctly use the 
>>>>>> skip-list.txt method to generate thumbnails?
>>>>>>
>>>>>> Alternatively, is there any other method, such as using a script 
>>>>>> (e.g., Python), to generate the thumbnails for only the newly uploaded 
>>>>>> files?
>>>>>>
>>>>>> Please help me solve this query.
>>>>>>
>>>>>> Thanks & Regards
>>>>>> Sai Kumar S
>>>>>>
>>>>>> -- 
>>>> All messages to this mailing list should adhere to the Code of Conduct: 
>>>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "DSpace Community" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/b8a75621-7081-4f06-8e05-6b763b5b1a98n%40googlegroups.com.

Reply via email to