Hi Sai & Daan, The filter-media script always loops through all objects to *determine* which ones need to be processed. This script is in charge of *not only* thumbnails, but also for extracting text for indexing purposes (and any other actions that are enabled as "filter.plugins" in your dspace.cfg). See the full docs at https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content
So, this script doesn't keep a list of objects which have already had generated thumbnails. The reason is that, even if a file has a generated thumbnail, it's possible the file needs to be processed by other filters (e.g. for full text indexing the textual content may be extracted). So, every time you run "filter-media" it will loop through every file...but will skip any files that it notices were already processed (e.g. if the file already has a thumbnail or extracted text, it will not re-generate it unless you use the "-f" flag to force regeneration). The "skip mode" (-s flag) concept can also be used to tell it to skip entire communities/collections/items...but then it will never process that object again until it is removed from the skip list. So, this should be used sparingly unless you are sure the object never will need a new thumbnail or full text indexing, etc. There are options to process files little by little (using the "-m" or maximum flag) or even process files community-by-community or collection-by-collection (using the "-i" or identifier flag) in order to break down a larger job into smaller chunks. This is simply how this tool works at this time. I do agree there may be ways to make it more efficient. But, we haven't had a developer volunteer to do such work or to redesign the current process. If you or anyone else out in the community are interested in helping to improve this tool, I'm sure the Committers would welcome ideas. All code in DSpace is built/support by volunteers and users. We don't have a centralized development team (i.e. I have no developers working for me). Semi-related this this, there have been past discussions about migrating all media filter scripts/tools into curation tasks (which would allow these processes to be run one-by-one as each new submission is added to DSpace, instead of via the current bulk processing script). There's some older tickets/PRs related to that, but it has never been finished / found to be fully working. See https://github.com/DSpace/DSpace/issues/6398 and https://github.com/DSpace/DSpace/pull/1674 (That said, I'd love to see this work completed at some point.) Tim On Tuesday, June 11, 2024 at 8:58:49 AM UTC-5 [email protected] wrote: > Hi Daan, > > Thankyou for your reply > > As you said if I have to restore an entire database and the assetstore, it > depends whether the thumbnail have been generated before taking the backup, > or if thumbnail were generated then no need to regenerate the thumbnail > from the scratch(I may not be correct, if any information I have given is > wrong please correct). > > As I wanted to know is that when I keep for generating thumbnail, why it > starts from scratch(but the generated thumbnail gets skipped anyways) > > I thought is there any other method where already generated thumbnails > does not get read and only generates the required(means which does not have > thumbnails) > > Regards > Sai Kumar S > > On Tue, 11 Jun 2024, 11:06 am Daan Lessing, <[email protected]> wrote: > >> Good morning, >> >> Just a follow-up question on this. Let's say for instance you have to >> restore an entire database and the assetstore, do you lose all thumbnails >> and will filter-media have to start building thumbnails from scratch? >> >> I have been running filter-media and it has been running for 3 weeks and >> not yet completed. >> >> Looking forward to your response. >> >> Kind regards, >> Daan >> >> >> >> >> >> [image: Mailtrack] >> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> >> Sender >> notified by >> Mailtrack >> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> >> 11/06/24, >> 07:32:08 >> >> On Tue, Jun 11, 2024 at 6:28 AM SAI KUMAR S <[email protected]> >> wrote: >> >>> Hi Tim >>> >>> Thank you for the information. >>> >>> The issue is that when we run the command line *./dspace filter-media*, >>> the thumbnail-generated files are also read, but they are skipped. This >>> means the process reads the files from the beginning each time, which takes >>> more time as the number of files increases. >>> >>> Is there any other method, such as executing a script, for generating >>> thumbnails more efficiently? >>> Regards >>> Sai Kumar S >>> >>> On Tuesday 11 June 2024 at 02:37:15 UTC+5:30 DSpace Community wrote: >>> >>>> Hi Sai, >>>> >>>> If you run "filter-media" **without** the "-f" flag, then it should >>>> automatically skip all Items that already have generated thumbnails. For >>>> example: >>>> >>>> ./dspace filter-media >>>> >>>> When you run it **with** the "-f" flag, that tells the filter-media >>>> script to **regenerate all thumbnails**. >>>> >>>> For more information see the documentation on this script >>>> <https://wiki.lyrasis.org/display/DSDOC7x/Mediafilters+for+Transforming+DSpace+Content#MediafiltersforTransformingDSpaceContent-Executing(viaCommandLine)> >>>> . >>>> >>>> (The "skip list" is only needed if you have files which are >>>> consistently throwing errors and you want to *skip them from all future >>>> runs* of the "filter-media" script. But, it shouldn't be necessary in >>>> your >>>> use case.) >>>> >>>> Tim >>>> >>>> On Monday, June 10, 2024 at 5:09:33 AM UTC-5 [email protected] >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I have a query regarding filter-media. I have uploaded around 1000 >>>>> books to a collection and generated thumbnails for the PDF files using >>>>> the >>>>> command line *dspace filter-media -f.* >>>>> >>>>> However, when I upload another 1000 files to the same collection, I >>>>> need to generate thumbnails only for the newly uploaded files. I tried >>>>> using the skip mode by creating a *skip-list.txt*, but I am not >>>>> getting the desired result. >>>>> >>>>> Could anyone of you provide me an example of how to correctly use the >>>>> skip-list.txt method to generate thumbnails? >>>>> >>>>> Alternatively, is there any other method, such as using a script >>>>> (e.g., Python), to generate the thumbnails for only the newly uploaded >>>>> files? >>>>> >>>>> Please help me solve this query. >>>>> >>>>> Thanks & Regards >>>>> Sai Kumar S >>>>> >>>>> -- >>> All messages to this mailing list should adhere to the Code of Conduct: >>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "DSpace Community" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/dspace-community/07d120cd-74de-4420-b49d-d3ee6744738an%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/6190fd47-32e8-4f7f-a1d5-3c9744dce5ean%40googlegroups.com.
