Hi Damian Józefowski, Thankyou
Can you please share that information which could help us on getting the solution to the issue. Regards Sai Kumar S On Thursday 13 June 2024 at 18:32:38 UTC+5:30 Damian Józefowski wrote: > Hi, > In one of our projects at PCG we have added an option to this script which > allows you to set a date from which the thumbnails will be generated. The > date compares item last modified value. > If you are interested, we can prepare a PR with this solution. > > Best regards > Damian Józefowski > > śr., 12 cze 2024, 17:24 użytkownik DSpace Community < > [email protected]> napisał: > >> Hi Sai & Daan, >> >> The filter-media script always loops through all objects to *determine* >> which ones need to be processed. This script is in charge of *not only* >> thumbnails, but also for extracting text for indexing purposes (and any >> other actions that are enabled as "filter.plugins" in your dspace.cfg). >> See the full docs at >> https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content >> >> So, this script doesn't keep a list of objects which have already had >> generated thumbnails. The reason is that, even if a file has a generated >> thumbnail, it's possible the file needs to be processed by other filters >> (e.g. for full text indexing the textual content may be extracted). So, >> every time you run "filter-media" it will loop through every file...but >> will skip any files that it notices were already processed (e.g. if the >> file already has a thumbnail or extracted text, it will not re-generate it >> unless you use the "-f" flag to force regeneration). >> >> The "skip mode" (-s flag) concept can also be used to tell it to skip >> entire communities/collections/items...but then it will never process that >> object again until it is removed from the skip list. So, this should be >> used sparingly unless you are sure the object never will need a new >> thumbnail or full text indexing, etc. >> >> There are options to process files little by little (using the "-m" or >> maximum flag) or even process files community-by-community or >> collection-by-collection (using the "-i" or identifier flag) in order to >> break down a larger job into smaller chunks. >> >> This is simply how this tool works at this time. I do agree there may be >> ways to make it more efficient. But, we haven't had a developer volunteer >> to do such work or to redesign the current process. If you or anyone else >> out in the community are interested in helping to improve this tool, I'm >> sure the Committers would welcome ideas. All code in DSpace is >> built/support by volunteers and users. We don't have a centralized >> development team (i.e. I have no developers working for me). >> >> Semi-related this this, there have been past discussions about migrating >> all media filter scripts/tools into curation tasks (which would allow these >> processes to be run one-by-one as each new submission is added to DSpace, >> instead of via the current bulk processing script). There's some older >> tickets/PRs related to that, but it has never been finished / found to be >> fully working. See https://github.com/DSpace/DSpace/issues/6398 and >> https://github.com/DSpace/DSpace/pull/1674 (That said, I'd love to see >> this work completed at some point.) >> >> Tim >> >> >> >> On Tuesday, June 11, 2024 at 8:58:49 AM UTC-5 [email protected] >> wrote: >> >>> Hi Daan, >>> >>> Thankyou for your reply >>> >>> As you said if I have to restore an entire database and the assetstore, >>> it depends whether the thumbnail have been generated before taking the >>> backup, or if thumbnail were generated then no need to regenerate the >>> thumbnail from the scratch(I may not be correct, if any information I have >>> given is wrong please correct). >>> >>> As I wanted to know is that when I keep for generating thumbnail, why it >>> starts from scratch(but the generated thumbnail gets skipped anyways) >>> >>> I thought is there any other method where already generated thumbnails >>> does not get read and only generates the required(means which does not have >>> thumbnails) >>> >>> Regards >>> Sai Kumar S >>> >>> On Tue, 11 Jun 2024, 11:06 am Daan Lessing, <[email protected]> wrote: >>> >>>> Good morning, >>>> >>>> Just a follow-up question on this. Let's say for instance you have to >>>> restore an entire database and the assetstore, do you lose all thumbnails >>>> and will filter-media have to start building thumbnails from scratch? >>>> >>>> I have been running filter-media and it has been running for 3 weeks >>>> and not yet completed. >>>> >>>> Looking forward to your response. >>>> >>>> Kind regards, >>>> Daan >>>> >>>> >>>> >>>> >>>> >>>> [image: Mailtrack] >>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> >>>> Sender >>>> notified by >>>> Mailtrack >>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> >>>> 11/06/24, >>>> 07:32:08 >>>> >>>> On Tue, Jun 11, 2024 at 6:28 AM SAI KUMAR S <[email protected]> >>>> wrote: >>>> >>>>> Hi Tim >>>>> >>>>> Thank you for the information. >>>>> >>>>> The issue is that when we run the command line *./dspace filter-media*, >>>>> the thumbnail-generated files are also read, but they are skipped. This >>>>> means the process reads the files from the beginning each time, which >>>>> takes >>>>> more time as the number of files increases. >>>>> >>>>> Is there any other method, such as executing a script, for generating >>>>> thumbnails more efficiently? >>>>> Regards >>>>> Sai Kumar S >>>>> >>>>> On Tuesday 11 June 2024 at 02:37:15 UTC+5:30 DSpace Community wrote: >>>>> >>>>>> Hi Sai, >>>>>> >>>>>> If you run "filter-media" **without** the "-f" flag, then it should >>>>>> automatically skip all Items that already have generated thumbnails. >>>>>> For >>>>>> example: >>>>>> >>>>>> ./dspace filter-media >>>>>> >>>>>> When you run it **with** the "-f" flag, that tells the filter-media >>>>>> script to **regenerate all thumbnails**. >>>>>> >>>>>> For more information see the documentation on this script >>>>>> <https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content#MediafiltersforTransformingDSpaceContent-Executing(viaCommandLine)> >>>>>> . >>>>>> >>>>>> (The "skip list" is only needed if you have files which are >>>>>> consistently throwing errors and you want to *skip them from all future >>>>>> runs* of the "filter-media" script. But, it shouldn't be necessary in >>>>>> your >>>>>> use case.) >>>>>> >>>>>> Tim >>>>>> >>>>>> On Monday, June 10, 2024 at 5:09:33 AM UTC-5 [email protected] >>>>>> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I have a query regarding filter-media. I have uploaded around 1000 >>>>>>> books to a collection and generated thumbnails for the PDF files using >>>>>>> the >>>>>>> command line *dspace filter-media -f.* >>>>>>> >>>>>>> However, when I upload another 1000 files to the same collection, I >>>>>>> need to generate thumbnails only for the newly uploaded files. I tried >>>>>>> using the skip mode by creating a *skip-list.txt*, but I am not >>>>>>> getting the desired result. >>>>>>> >>>>>>> Could anyone of you provide me an example of how to correctly use >>>>>>> the skip-list.txt method to generate thumbnails? >>>>>>> >>>>>>> Alternatively, is there any other method, such as using a script >>>>>>> (e.g., Python), to generate the thumbnails for only the newly uploaded >>>>>>> files? >>>>>>> >>>>>>> Please help me solve this query. >>>>>>> >>>>>>> Thanks & Regards >>>>>>> Sai Kumar S >>>>>>> >>>>>>> -- >>>>> All messages to this mailing list should adhere to the Code of >>>>> Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "DSpace Community" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >> All messages to this mailing list should adhere to the Code of Conduct: >> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx >> --- >> You received this message because you are subscribed to the Google Groups >> "DSpace Community" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/dspace-community/6190fd47-32e8-4f7f-a1d5-3c9744dce5ean%40googlegroups.com >> >> <https://groups.google.com/d/msgid/dspace-community/6190fd47-32e8-4f7f-a1d5-3c9744dce5ean%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/ab9c083c-402b-4895-b115-5bffcefc6e52n%40googlegroups.com.
