Re: Retrieving the EXIF date/time from 250k images

2023-07-07 Thread Alex Zavatone via Cocoa-dev
I’ll support Jim Crate’s suggestion of EXIFTool. I just came across this the 
other day while trying to remove the GPS location from a QuickTime video once I 
had copied it to my Mac.  Honestly, the quickest solution for me was to open it 
on another Mac in QuickTime 7 Pro, press command J, delete the EXIF tracks and 
save a new copy.

Ohh, how I wish I had my hands on the olden source for the QuickTime 7 Player.  
IIRC at one time, there was some Apple source code on the developer site.  
Should have snagged it while I had a chance.

Regarding Jim’s thought of NSTask being too slow, if you do take this route, I 
wonder how better spawning up to 4 NSTasks would be to grab this info. Examine 
the time of one fetch until diminishing results are returned, then match that 
to the # of CPU cores and drive throughput.  I love these kind of things.  It’s 
a case of, “well, this sucks.  Can we make it quicker and if we can, can we 
keep doing it until it stops sucking and actually make it good enough?”

Cheers and happy Friday,
Alex Zavatone



> On Jan 15, 2023, at 5:59 PM, James Crate  wrote:
> 
> There is a perl program called exiftool that can load and set exif tool 
> without loading the image data (or at least it doesn’t decode the image 
> data). I don’t know whether it would be faster than loading image 
> data/properties with ImageIO. You could write a perl script that used your 
> bundled exiftool to load the exif data and output the results for many files 
> in a format your program could handle, because instantiating perl/exiftool 
> repeatedly for each image in a separate NSTask would probably be pretty slow. 
> 
> Jim Crate
> 
> 
>> On Jan 7, 2023, at 2:07 PM, Alex Zavatone via Cocoa-dev 
>>  wrote:
>> 
>> Hi Gabe.  I’d add basic logging  before you start each image and after you 
>> complete each image to see how much each is taking on each of problem tests 
>> so you can see the extent of how slow it is on your problem platforms.
>> 
>> Then you can add more logging to expose the problems and start to address 
>> them once you see where the bottlenecks are.
>> 
>> I wonder if there is a method to load the EXIF data out of the files without 
>> opening them completely.  That would seem like the ideal approach.
>> 
>> Cheers,
>> Alex Zavatone
>> 
>>> On Jan 7, 2023, at 12:36 PM, Gabriel Zachmann  wrote:
>>> 
>>> Hi Alex, hi everyone,
>>> 
>>> thanks a lot for the many suggestions!
>>> And sorry for following up on this so late!
>>> I hope you are still willing to engage in this discussion.
>>> 
>>> Yes, Alex, I agree in that the main question is:
>>> how can I get the metadata of a large amount of images (say, 100k-300k) 
>>> *without* actually loading the whole image files.
>>> (For your reference: I am interested in the date tags embedded in the EXIF 
>>> dictionary, and those dates will be read just once per image, then cached 
>>> in a dictionary containing filename & dates, and that dictionary will get 
>>> stored on disk for future use by the app.)
>>> 
 CGImageSourceRef imageSourceRef = 
 CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);
>>> 
>>> I have tried this:
>>> 
>>> for ( NSString* filename in imagefiles ) 
>>> {
>>>NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
>>>   CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge 
>>> CFURLRef) imgurl, NULL );
>>> }
>>> 
>>> This takes 1 minute for around 300k images stored on my internal SSD.
>>> That would be OK.
>>> 
>>> However! .. if performed on a folder stored on an external hard disk, I get 
>>> the following timings:
>>> 
>>>- 20 min for 150k images (45 GB) 
>>>- 12 min for 150k images (45 GB), second time
>>>- 150 sec for 25k images (18 GB)
>>>- 170 sec for 25k images (18 GB), with the lines below (*)
>>>- 80 sec for 22k (3 GB) images
>>>- 80 sec for 22k (3 GB) images, with the lines below (*)
>>> 
>>> All experiments were done on different folders on the same hard disk, WD 
>>> MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
>>> Timings with the same number of files/GB were the same folders, resp.
>>> 
>>> (*): these were timings where I added the following lines to the loop:
>>> 
>>>  CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
>>> 0, NULL );
>>>  bool success = CFDictionaryGetValueIfPresent( fileProps, 
>>> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>>>  CFDictionaryGetValueIfPresent( exif_dict, 
>>> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>>>  iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString * 
>>> _Nonnull)(dateref) ];
>>>  [datesAndTimes_ addObject: iso_date ];
>>> 
>>> (Plus some error checking, which I omit here.)
>>> 
>>> First of all, we can see that the vast majority of time is spent on 
>>> CGImageSourceCreateWithURL().
>>> Second, there seem to be some caching effects, although I have a hard time 
>>> 

Re: Retrieving the EXIF date/time from 250k images

2023-01-16 Thread Steven Mills via Cocoa-dev
On Jan 16, 2023, at 00:52, Gabriel Zachmann via Cocoa-dev 
 wrote:
> 
> I had thought of that , too.
> The reason is that I have no experience whatsoever with the JPEG file format.
> Nor with the EXIF file format.
> Also, I need to be able to parse at least JPEG, PNG, TIF, GIF, and I'd like 
> to add HEIC.

The file format docs are out there for all of those (at least I assume heic is 
published, too), and many of them are pretty straight forward. Learning how to 
read those files is a good learning experience and keeps you from being at the 
mercy of 3rd party solutions that you have no control over, or that aren't 
efficient at getting right to the data you need.

--
Steve Mills
Drummer, Mac geek

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-15 Thread Gabriel Zachmann via Cocoa-dev
> Is there any reason why you don't want to do it yourself, simply loading the 
> first KBs of the file ? I haven't been writing JPEG decoders for a lng 
> time, but I'm pretty sure the EXIF (APPn) marker must be written before the 
> image data (the SOS marker(s)), so it is at the beginning of the file.

I had thought of that , too.
The reason is that I have no experience whatsoever with the JPEG file format.
Nor with the EXIF file format.
Also, I need to be able to parse at least JPEG, PNG, TIF, GIF, and I'd like to 
add HEIC.

Best regards, Gabriel

> You could simply parse the first markers of the file (0xFF, 0x??) and look 
> for the EXIF one. The JPEG file format (JFIF) is pretty simple to decode (I 
> can help).
>
> Regards,
>
> Raphaël
>
>
> On Sun, Jan 8, 2023 at 3:30 PM Chris Ridd via Cocoa-dev 
>  wrote:
>
>
> > On 7 Jan 2023, at 23:31, Gabriel Zachmann via Cocoa-dev 
> >  wrote:
> >
> >>
> >> I wonder if there is a method to load the EXIF data out of the files 
> >> without opening them completely.  That would seem like the ideal approach.
> >
> > That's the approach I am looking for, but I haven't found any API to do 
> > that.
>
> I’m sure you’ve tried the following, but in case you haven’t:
>
> * investigate mmapping each file into memory and use a version of the API 
> that takes NSData/CFData instead of a file/URL to get the EXIF information.
>
> * if the files are indexed by Spotlight, can you get the EXIF information 
> from a NSMetadataQuery search?
>
> Chris
> ___
>
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/lemoine.raphael%40gmail.com
>
> This email sent to lemoine.raph...@gmail.com



smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-15 Thread James Crate via Cocoa-dev
There is a perl program called exiftool that can load and set exif tool without 
loading the image data (or at least it doesn’t decode the image data). I don’t 
know whether it would be faster than loading image data/properties with 
ImageIO. You could write a perl script that used your bundled exiftool to load 
the exif data and output the results for many files in a format your program 
could handle, because instantiating perl/exiftool repeatedly for each image in 
a separate NSTask would probably be pretty slow. 

Jim Crate


> On Jan 7, 2023, at 2:07 PM, Alex Zavatone via Cocoa-dev 
>  wrote:
> 
> Hi Gabe.  I’d add basic logging  before you start each image and after you 
> complete each image to see how much each is taking on each of problem tests 
> so you can see the extent of how slow it is on your problem platforms.
> 
> Then you can add more logging to expose the problems and start to address 
> them once you see where the bottlenecks are.
> 
> I wonder if there is a method to load the EXIF data out of the files without 
> opening them completely.  That would seem like the ideal approach.
> 
> Cheers,
> Alex Zavatone
> 
>> On Jan 7, 2023, at 12:36 PM, Gabriel Zachmann  wrote:
>> 
>> Hi Alex, hi everyone,
>> 
>> thanks a lot for the many suggestions!
>> And sorry for following up on this so late!
>> I hope you are still willing to engage in this discussion.
>> 
>> Yes, Alex, I agree in that the main question is:
>> how can I get the metadata of a large amount of images (say, 100k-300k) 
>> *without* actually loading the whole image files.
>> (For your reference: I am interested in the date tags embedded in the EXIF 
>> dictionary, and those dates will be read just once per image, then cached in 
>> a dictionary containing filename & dates, and that dictionary will get 
>> stored on disk for future use by the app.)
>> 
>>> CGImageSourceRef imageSourceRef = 
>>> CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);
>> 
>> I have tried this:
>> 
>>  for ( NSString* filename in imagefiles ) 
>>  {
>> NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
>>CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge 
>> CFURLRef) imgurl, NULL );
>>  }
>> 
>> This takes 1 minute for around 300k images stored on my internal SSD.
>> That would be OK.
>> 
>> However! .. if performed on a folder stored on an external hard disk, I get 
>> the following timings:
>> 
>> - 20 min for 150k images (45 GB) 
>> - 12 min for 150k images (45 GB), second time
>> - 150 sec for 25k images (18 GB)
>> - 170 sec for 25k images (18 GB), with the lines below (*)
>> - 80 sec for 22k (3 GB) images
>> - 80 sec for 22k (3 GB) images, with the lines below (*)
>> 
>> All experiments were done on different folders on the same hard disk, WD 
>> MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
>> Timings with the same number of files/GB were the same folders, resp.
>> 
>> (*): these were timings where I added the following lines to the loop:
>> 
>>   CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
>> 0, NULL );
>>   bool success = CFDictionaryGetValueIfPresent( fileProps, 
>> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>>   CFDictionaryGetValueIfPresent( exif_dict, 
>> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>>   iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString * 
>> _Nonnull)(dateref) ];
>>   [datesAndTimes_ addObject: iso_date ];
>> 
>> (Plus some error checking, which I omit here.)
>> 
>> First of all, we can see that the vast majority of time is spent on 
>> CGImageSourceCreateWithURL().
>> Second, there seem to be some caching effects, although I have a hard time 
>> understanding that, but that is not the point.
>> Third, the durations are not linear; I guess it might have something to do 
>> with the sizes of the files, too, but again, didn't investigate further.
>> 
>> So, it looks to me like CGImageSourceCreateWithURL() really loads the 
>> complete image file.
>> 
>> I don't see why Ole Begemann (ref'ed in Alex' post) can claim his approach 
>> does not load the whole image.
>> 
>> 
>> Some people suggested parallelizing the whole task, using 
>> dispatch_queue_create or NSOperationQueue.
>> (Thanks Steve, Gary, Jack!)
>> Before restructuring my code for that, I would like to better understand why 
>> you think that will speed up things.
>> The code above pretty much does no computations, so most of the time is, I 
>> guess, spent on waiting for the data to arrive from hard disk.
>> So, why would would several threads loading those images in parallel help 
>> here? In my thinking, they will just compete for the same resource, i.e., 
>> hard disk.
>> 
>> 
>> I also googled quite a bit, to no avail.
>> 
>> Any and all hints, suggestions, and insights will be highly appreciated!
>> Best, Gab
>> 
>> 
>>> 
>> 
>> 
>>> if (!imageSourceRef)

Re: Retrieving the EXIF date/time from 250k images

2023-01-08 Thread Chris Ridd via Cocoa-dev


> On 7 Jan 2023, at 23:31, Gabriel Zachmann via Cocoa-dev 
>  wrote:
> 
>> 
>> I wonder if there is a method to load the EXIF data out of the files without 
>> opening them completely.  That would seem like the ideal approach.
> 
> That's the approach I am looking for, but I haven't found any API to do that.

I’m sure you’ve tried the following, but in case you haven’t:

* investigate mmapping each file into memory and use a version of the API that 
takes NSData/CFData instead of a file/URL to get the EXIF information. 

* if the files are indexed by Spotlight, can you get the EXIF information from 
a NSMetadataQuery search?

Chris
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gary L. Wade via Cocoa-dev
You don’t seem to understand modern filesystems or modern hardware like some of 
us.
--
Gary L. Wade
https://www.garywade.com/ 

> On Jan 7, 2023, at 3:29 PM, Gabriel Zachmann  wrote:
> 
> *Maybe* ...
> that would mean that the filesystem performs predictive caching like the 
> CPU's cache / memory management does ... 
> 
> Also, I think you might even get worse performance: imagine several threads 
> reading different files *at the same time* - those files could lie on 
> different HD cylinders far apart from each other, so the disk heads would 
> have to move back and forth during those concurrent read operations, wouldn't 
> it? unless the disk would do some very clever caching / batching / prediction 
> itself ...
> 
> G.
> 
>> Since file systems and the associated hardware are designed to be efficient 
>> by caching data and knowing things like where one file or block is in 
>> relation to another, there’s a possibility these mechanisms could work to 
>> your advantage, pulling in the data while “in the neighborhood.” There’s no 
>> guarantee, of course, but I feel it’s worth considering.
>> --
>> Gary L. Wade
>> http://www.garywade.com/
>> 
>>> On Jan 7, 2023, at 10:37 AM, Gabriel Zachmann via Cocoa-dev 
>>>  wrote:
>>> 
>>> So, why would would several threads loading those images in parallel help 
>>> here? In my thinking, they will just compete for the same resource, i.e., 
>>> hard disk.
>>> 
>> 
> 

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gabriel Zachmann via Cocoa-dev
>
> I wonder if there is a method to load the EXIF data out of the files without 
> opening them completely.  That would seem like the ideal approach.

That's the approach I am looking for, but I haven't found any API to do that.

Best, G.



smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gabriel Zachmann via Cocoa-dev
*Maybe* ...
that would mean that the filesystem performs predictive caching like the CPU's 
cache / memory management does ...

Also, I think you might even get worse performance: imagine several threads 
reading different files *at the same time* - those files could lie on different 
HD cylinders far apart from each other, so the disk heads would have to move 
back and forth during those concurrent read operations, wouldn't it? unless the 
disk would do some very clever caching / batching / prediction itself ...

G.

> Since file systems and the associated hardware are designed to be efficient 
> by caching data and knowing things like where one file or block is in 
> relation to another, there’s a possibility these mechanisms could work to 
> your advantage, pulling in the data while “in the neighborhood.” There’s no 
> guarantee, of course, but I feel it’s worth considering.
> --
> Gary L. Wade
> http://www.garywade.com/
>
>> On Jan 7, 2023, at 10:37 AM, Gabriel Zachmann via Cocoa-dev 
>>  wrote:
>>
>> So, why would would several threads loading those images in parallel help 
>> here? In my thinking, they will just compete for the same resource, i.e., 
>> hard disk.
>>
>



smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Alex Zavatone via Cocoa-dev
Hi Gabe.  I’d add basic logging  before you start each image and after you 
complete each image to see how much each is taking on each of problem tests so 
you can see the extent of how slow it is on your problem platforms.

Then you can add more logging to expose the problems and start to address them 
once you see where the bottlenecks are.

I wonder if there is a method to load the EXIF data out of the files without 
opening them completely.  That would seem like the ideal approach.

Cheers,
Alex Zavatone

> On Jan 7, 2023, at 12:36 PM, Gabriel Zachmann  wrote:
> 
> Hi Alex, hi everyone,
> 
> thanks a lot for the many suggestions!
> And sorry for following up on this so late!
> I hope you are still willing to engage in this discussion.
> 
> Yes, Alex, I agree in that the main question is:
> how can I get the metadata of a large amount of images (say, 100k-300k) 
> *without* actually loading the whole image files.
> (For your reference: I am interested in the date tags embedded in the EXIF 
> dictionary, and those dates will be read just once per image, then cached in 
> a dictionary containing filename & dates, and that dictionary will get stored 
> on disk for future use by the app.)
> 
>> CGImageSourceRef imageSourceRef = 
>> CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);
> 
> I have tried this:
> 
>   for ( NSString* filename in imagefiles ) 
>   {
>  NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
> CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge 
> CFURLRef) imgurl, NULL );
>   }
> 
> This takes 1 minute for around 300k images stored on my internal SSD.
> That would be OK.
> 
> However! .. if performed on a folder stored on an external hard disk, I get 
> the following timings:
> 
>  - 20 min for 150k images (45 GB) 
>  - 12 min for 150k images (45 GB), second time
>  - 150 sec for 25k images (18 GB)
>  - 170 sec for 25k images (18 GB), with the lines below (*)
>  - 80 sec for 22k (3 GB) images
>  - 80 sec for 22k (3 GB) images, with the lines below (*)
> 
> All experiments were done on different folders on the same hard disk, WD 
> MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
> Timings with the same number of files/GB were the same folders, resp.
> 
> (*): these were timings where I added the following lines to the loop:
> 
>CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
> 0, NULL );
>bool success = CFDictionaryGetValueIfPresent( fileProps, 
> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>CFDictionaryGetValueIfPresent( exif_dict, 
> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString * 
> _Nonnull)(dateref) ];
>[datesAndTimes_ addObject: iso_date ];
> 
> (Plus some error checking, which I omit here.)
> 
> First of all, we can see that the vast majority of time is spent on 
> CGImageSourceCreateWithURL().
> Second, there seem to be some caching effects, although I have a hard time 
> understanding that, but that is not the point.
> Third, the durations are not linear; I guess it might have something to do 
> with the sizes of the files, too, but again, didn't investigate further.
> 
> So, it looks to me like CGImageSourceCreateWithURL() really loads the 
> complete image file.
> 
> I don't see why Ole Begemann (ref'ed in Alex' post) can claim his approach 
> does not load the whole image.
> 
> 
> Some people suggested parallelizing the whole task, using 
> dispatch_queue_create or NSOperationQueue.
> (Thanks Steve, Gary, Jack!)
> Before restructuring my code for that, I would like to better understand why 
> you think that will speed up things.
> The code above pretty much does no computations, so most of the time is, I 
> guess, spent on waiting for the data to arrive from hard disk.
> So, why would would several threads loading those images in parallel help 
> here? In my thinking, they will just compete for the same resource, i.e., 
> hard disk.
> 
> 
> I also googled quite a bit, to no avail.
> 
> Any and all hints, suggestions, and insights will be highly appreciated!
> Best, Gab
> 
> 
>> 
> 
> 
>> if (!imageSourceRef)
>> return;
>> 
>> CFDictionaryRef props = CGImageSourceCopyPropertiesAtIndex(imageSourceRef, 
>> 0, NULL);
>> 
>> NSDictionary *properties = (NSDictionary*)CFBridgingRelease(props);
>> 
>> if (!properties) {
>> return;
>> }
>> 
>> NSNumber *height = [properties objectForKey:@"PixelHeight"];
>> NSNumber *width = [properties objectForKey:@"PixelWidth"];
>> int height = 0;
>> int width = 0;
>> 
>> if (height) {
>> height = [height intValue];
>> }
>> if (width) {
>> width = [width intValue];
>> }
>> 
>> 
>> Or this link by Ole Bergmann?
>> 
>> https://oleb.net/blog/2011/09/accessing-image-properties-without-loading-the-image-into-memory/
>> 
>> I love these questions.  I find out more about iOS programming 

Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gary L. Wade via Cocoa-dev
Since file systems and the associated hardware are designed to be efficient by 
caching data and knowing things like where one file or block is in relation to 
another, there’s a possibility these mechanisms could work to your advantage, 
pulling in the data while “in the neighborhood.” There’s no guarantee, of 
course, but I feel it’s worth considering.
--
Gary L. Wade
http://www.garywade.com/

> On Jan 7, 2023, at 10:37 AM, Gabriel Zachmann via Cocoa-dev 
>  wrote:
> 
> So, why would would several threads loading those images in parallel help 
> here? In my thinking, they will just compete for the same resource, i.e., 
> hard disk.
> 

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2023-01-07 Thread Gabriel Zachmann via Cocoa-dev
Hi Alex, hi everyone,

thanks a lot for the many suggestions!
And sorry for following up on this so late!
I hope you are still willing to engage in this discussion.

Yes, Alex, I agree in that the main question is:
how can I get the metadata of a large amount of images (say, 100k-300k) 
*without* actually loading the whole image files.
(For your reference: I am interested in the date tags embedded in the EXIF 
dictionary, and those dates will be read just once per image, then cached in a 
dictionary containing filename & dates, and that dictionary will get stored on 
disk for future use by the app.)

> CGImageSourceRef imageSourceRef = 
> CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);

I have tried this:

   for ( NSString* filename in imagefiles )
   {
  NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
 CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge 
CFURLRef) imgurl, NULL );
   }

This takes 1 minute for around 300k images stored on my internal SSD.
That would be OK.

However! .. if performed on a folder stored on an external hard disk, I get the 
following timings:

  - 20 min for 150k images (45 GB)
  - 12 min for 150k images (45 GB), second time
  - 150 sec for 25k images (18 GB)
  - 170 sec for 25k images (18 GB), with the lines below (*)
  - 80 sec for 22k (3 GB) images
  - 80 sec for 22k (3 GB) images, with the lines below (*)

All experiments were done on different folders on the same hard disk, WD 
MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
Timings with the same number of files/GB were the same folders, resp.

(*): these were timings where I added the following lines to the loop:

CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
0, NULL );
bool success = CFDictionaryGetValueIfPresent( fileProps, 
kCGImagePropertyExifDictionary, (const void **) & exif_dict );
CFDictionaryGetValueIfPresent( exif_dict, 
kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString * 
_Nonnull)(dateref) ];
[datesAndTimes_ addObject: iso_date ];

(Plus some error checking, which I omit here.)

First of all, we can see that the vast majority of time is spent on 
CGImageSourceCreateWithURL().
Second, there seem to be some caching effects, although I have a hard time 
understanding that, but that is not the point.
Third, the durations are not linear; I guess it might have something to do with 
the sizes of the files, too, but again, didn't investigate further.

So, it looks to me like CGImageSourceCreateWithURL() really loads the complete 
image file.

I don't see why Ole Begemann (ref'ed in Alex' post) can claim his approach does 
not load the whole image.


Some people suggested parallelizing the whole task, using dispatch_queue_create 
or NSOperationQueue.
(Thanks Steve, Gary, Jack!)
Before restructuring my code for that, I would like to better understand why 
you think that will speed up things.
The code above pretty much does no computations, so most of the time is, I 
guess, spent on waiting for the data to arrive from hard disk.
So, why would would several threads loading those images in parallel help here? 
In my thinking, they will just compete for the same resource, i.e., hard disk.


I also googled quite a bit, to no avail.

Any and all hints, suggestions, and insights will be highly appreciated!
Best, Gab


>


> if (!imageSourceRef)
> return;
>
> CFDictionaryRef props = CGImageSourceCopyPropertiesAtIndex(imageSourceRef, 0, 
> NULL);
>
> NSDictionary *properties = (NSDictionary*)CFBridgingRelease(props);
>
> if (!properties) {
> return;
> }
>
> NSNumber *height = [properties objectForKey:@"PixelHeight"];
> NSNumber *width = [properties objectForKey:@"PixelWidth"];
> int height = 0;
> int width = 0;
>
> if (height) {
> height = [height intValue];
> }
> if (width) {
> width = [width intValue];
> }
>
>
> Or this link by Ole Bergmann?
>
> https://oleb.net/blog/2011/09/accessing-image-properties-without-loading-the-image-into-memory/
>
> I love these questions.  I find out more about iOS programming by researching 
> other people’s problems than the ones that I’m currently faced with.
>
> Hopefully some of these will help.
>
> Cheers,
> Alex Zavatone



smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-18 Thread James Crate via Cocoa-dev
On Aug 18, 2022, at 7:47 AM, Mike Abdullah  wrote:
> 
> It’s not a very good fit, but when you say a “GCD concurrent queue”, you’d 
> need to be more specific. There are several configs possible. Do you mean a 
> global queue, or one you made yourself? If you made it yourself, how did you 
> configure it?

Like Alex, for me this was 10-12 years ago so I don’t remember much about 
exactly how I tried to use GCD. It spun up too many threads, I re-read the docs 
and assumed it was spinning up new threads based on file IO blocking, and 
killing itself. I didn’t see a handy way to limit GCD concurrent queueing, and 
NSOperationQueue had maxConcurrentOperationCount documented. NSOperationQueue 
was also more idiomatic Objective-C and easier to use/understand in an Obj-C 
program. 

> The tricky problem is that GCD aims to optimise CPU usage. Reading image 
> metadata from disk is almost certainly not CPU bound; it’s going to be 
> limited by disk speed and the file system instead. GCD will have a tendency 
> to see tasks blocked on doing this, not occupying the CPU, and so spin up 
> another task in the hope that one _will_ then use the CPU, eventually 
> grinding to a halt as a huge number of threads contend for access to the file 
> system.

In my case the app reads hundreds images from disk or network file share, 
performs operations (scale/rotate/crop/etc), rendering on CPU (better accuracy 
for printing) and writing to disk or network share. Perfect fit for 
NSOperationQueue with maxConcurrentOperationCount, so I was grateful someone at 
Apple had already done the hard work and made concurrent programming much 
easier for people with tasks like mine.

The OP is just grabbing image metadata but would almost certainly run into the 
same problem using GCD's dispatch_async so they’ll also be much better off 
using NSOperationQueue with maxConcurrentOperationCount. 

Jim Crate


> On 17 Aug 2022, at 20:32, James Crate via Cocoa-dev 
>  wrote:
>> 
>> I have an app that does some image processing, and when I tried to use GCD 
>> it created several hundred threads which didn’t work very well. 
>> NSOperationQueue allows you to set the max concurrent operations, and the 
>> batch exporting process fully utilizes all logical cores on the CPU.
>> 
>> opsQueue.maxConcurrentOperationCount = 
>> NSProcessInfo.processInfo.processorCount;
>> 
>> Maybe I was using GCD wrong, or maybe reading, processing, and writing 
>> several hundred images is not a good fit for GCD concurrent queue? In any 
>> case NSOperationQueue is easy to use and works well.
>> 
>> Jim Crate

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-18 Thread Alex Zavatone via Cocoa-dev


> On Aug 18, 2022, at 6:47 AM, Mike Abdullah via Cocoa-dev 
>  wrote:
> 
> It’s not a very good fit, but when you say a “GCD concurrent queue”, you’d 
> need to be more specific. There are several configs possible. Do you mean a 
> global queue, or one you made yourself? If you made it yourself, how did you 
> configure it?
> 

That was ~14 years ago.  I’m just sharing what I remember.  In my case, we 
wanted the threads to read and then perform actions on the images.

Of course, reading images from media will be I/O bound from the device they’re 
being read from.  

Cheers,
Alex Zavatone
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-18 Thread Mike Abdullah via Cocoa-dev
It’s not a very good fit, but when you say a “GCD concurrent queue”, you’d need 
to be more specific. There are several configs possible. Do you mean a global 
queue, or one you made yourself? If you made it yourself, how did you configure 
it?

The tricky problem is that GCD aims to optimise CPU usage. Reading image 
metadata from disk is almost certainly not CPU bound; it’s going to be limited 
by disk speed and the file system instead. GCD will have a tendency to see 
tasks blocked on doing this, not occupying the CPU, and so spin up another task 
in the hope that one _will_ then use the CPU, eventually grinding to a halt as 
a huge number of threads contend for access to the file system.

Some kind of limit is wise.

You may also wish to look at dispatch_apply(), which does work in parallel, but 
via a serial API. It’s a concurrent version of a for loop.

Even better, for what you’re asking for, there’s a very good chance that 
Spotlight has the info you want indexed and on-hand. You can look into whether 
that is available via an API as a fast path to take when possible.

Mike.


> On 17 Aug 2022, at 20:32, James Crate via Cocoa-dev 
>  wrote:
> 
> I have an app that does some image processing, and when I tried to use GCD it 
> created several hundred threads which didn’t work very well. NSOperationQueue 
> allows you to set the max concurrent operations, and the batch exporting 
> process fully utilizes all logical cores on the CPU.
> 
> opsQueue.maxConcurrentOperationCount = 
> NSProcessInfo.processInfo.processorCount;
> 
> Maybe I was using GCD wrong, or maybe reading, processing, and writing 
> several hundred images is not a good fit for GCD concurrent queue? In any 
> case NSOperationQueue is easy to use and works well.
> 
> Jim Crate
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-17 Thread Alex Zavatone via Cocoa-dev
Hi Jim.  You did exactly what I did.  You found the level of diminishing 
returns on threads.  

Back in 2008, while working on FiOS TV for Verizon I wrote the design to 
development automation pipeline that automated export of graphic assets from 
Illustrator, Photoshop and ImageOptim by Kornel Lesińsk.  It written in 
AppleScript (kill me now) wrapped in AppleScript-Obj-C in Xcode 3.x IIRC.  In 
spawning processes for image optimization, I tried the “make a thread for each 
image” and quickly ran into what you did.

BUT, if they were spawned, suspended and then added to an operation queue, you 
can let the operation queue to the heavy lifting.  Just as you mentioned.

One issue you could have run in to is if the media you were performing the 
writes to took longer than the amount of time to perform the image processing, 
creating a write bottleneck.

Thanks for posting the processInfo.processorCount.  That’s what I was thinking 
about but couldn’t remember.  


FYI, ImageOptim is a fun tool for smallifying images in both lossless and lossy 
operation.  It’s got one special trick I discovered added to it to get extra 
smallification.  In the images I tested, we went from ~ 18% smaller to 24 and 
then 28% smaller using my technique that’s wrapped into ImageOptim, which uses 
multiple tools and multiple parameter iterations and compares the image sizes 
to pick the smallest.  Then it wraps that with my technique until no more 
smallification can be achieved.  It’s fun.  It’s free.  And Kornel is a great 
guy for making this available if this matters to you.   Here’s the link.
https://imageoptim.com/mac

Cheers,
Alex Zavatone

> On Aug 17, 2022, at 1:32 PM, James Crate via Cocoa-dev 
>  wrote:
> 
> I have an app that does some image processing, and when I tried to use GCD it 
> created several hundred threads which didn’t work very well. NSOperationQueue 
> allows you to set the max concurrent operations, and the batch exporting 
> process fully utilizes all logical cores on the CPU.
> 
> opsQueue.maxConcurrentOperationCount = 
> NSProcessInfo.processInfo.processorCount;
> 
> Maybe I was using GCD wrong, or maybe reading, processing, and writing 
> several hundred images is not a good fit for GCD concurrent queue? In any 
> case NSOperationQueue is easy to use and works well.
> 
> Jim Crate
> 
> 
>> On Aug 16, 2022, at 3:37 PM, Jack Brindle via Cocoa-dev 
>>  wrote:
>> 
>> Instead of using NSOperationQueue, I would use GCD to handle the tasks. 
>> Create a new Concurrent queue 
>> (dispatch_queue_create(DISPATCH_QUEUE_CONCURRENT)), then enqueue the 
>> individual items to the queue for processing (dispatch_async(), using the 
>> queue created above). Everything can be handled in blocks, including the 
>> completion routines. As Christian says the problem then is that data may not 
>> be in the original order so you will probably want to sort the returned 
>> objects when done. This should significantly speed up the time to do the 
>> whole task.
>> 
>> Jack
>> 
>> 
>>> On Aug 16, 2022, at 12:26 PM, Steve Christensen via Cocoa-dev 
>>>  wrote:
>>> 
>>> You mentioned creating and managing threads on your own, but that’s what 
>>> NSOperationQueue —and the lower-level DispatchQueue— does. It also will be 
>>> more efficient with thread management since it has an intimate 
>>> understanding of the capabilities of the processor, etc., and will work to 
>>> do the “right thing” on a per-device basis.
>>> 
>>> By leveraging NSOperationQueue and then keeping each of the queue 
>>> operations focused on a single file then you’re not complicating the 
>>> management of what to do next since most of that is handled for you. Let 
>>> NSManagedObjectQueue do the heavy lifting (scheduling work) and focus on 
>>> your part of the task (performing the work).
>>> 
>>> Steve
>>> 
 On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann  
 wrote:
 
 That is a good idea.  Thanks a lot!
 
 Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
 latency hiding), as follows:
 create a number of threads (workers);
 as soon as a worker is finished with their "current" image, it gets the 
 next one (a piece of work) out of the list, processes it, and stores the 
 iso_date in the output array (dates_and_times).
 Both accesses to the pointer to the currently next piece of work, and the 
 output array would need to be made exclusive, of course.
 
 Best regards, Gabriel
>>> 
> 
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/zav%40mac.com
> 
> This email sent to z...@mac.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)


Re: Retrieving the EXIF date/time from 250k images

2022-08-17 Thread James Crate via Cocoa-dev
I have an app that does some image processing, and when I tried to use GCD it 
created several hundred threads which didn’t work very well. NSOperationQueue 
allows you to set the max concurrent operations, and the batch exporting 
process fully utilizes all logical cores on the CPU.

opsQueue.maxConcurrentOperationCount = NSProcessInfo.processInfo.processorCount;

Maybe I was using GCD wrong, or maybe reading, processing, and writing several 
hundred images is not a good fit for GCD concurrent queue? In any case 
NSOperationQueue is easy to use and works well.

Jim Crate


> On Aug 16, 2022, at 3:37 PM, Jack Brindle via Cocoa-dev 
>  wrote:
> 
> Instead of using NSOperationQueue, I would use GCD to handle the tasks. 
> Create a new Concurrent queue 
> (dispatch_queue_create(DISPATCH_QUEUE_CONCURRENT)), then enqueue the 
> individual items to the queue for processing (dispatch_async(), using the 
> queue created above). Everything can be handled in blocks, including the 
> completion routines. As Christian says the problem then is that data may not 
> be in the original order so you will probably want to sort the returned 
> objects when done. This should significantly speed up the time to do the 
> whole task.
> 
> Jack
> 
> 
>> On Aug 16, 2022, at 12:26 PM, Steve Christensen via Cocoa-dev 
>>  wrote:
>> 
>> You mentioned creating and managing threads on your own, but that’s what 
>> NSOperationQueue —and the lower-level DispatchQueue— does. It also will be 
>> more efficient with thread management since it has an intimate understanding 
>> of the capabilities of the processor, etc., and will work to do the “right 
>> thing” on a per-device basis.
>> 
>> By leveraging NSOperationQueue and then keeping each of the queue operations 
>> focused on a single file then you’re not complicating the management of what 
>> to do next since most of that is handled for you. Let NSManagedObjectQueue 
>> do the heavy lifting (scheduling work) and focus on your part of the task 
>> (performing the work).
>> 
>> Steve
>> 
>>> On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann  wrote:
>>> 
>>> That is a good idea.  Thanks a lot!
>>> 
>>> Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
>>> latency hiding), as follows:
>>> create a number of threads (workers);
>>> as soon as a worker is finished with their "current" image, it gets the 
>>> next one (a piece of work) out of the list, processes it, and stores the 
>>> iso_date in the output array (dates_and_times).
>>> Both accesses to the pointer to the currently next piece of work, and the 
>>> output array would need to be made exclusive, of course.
>>> 
>>> Best regards, Gabriel
>> 

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Alex Zavatone via Cocoa-dev
I love experimenting with processes like this.  Create as many operation 
processes as you have cores on your box and run a few tests to see how quickly 
the operations complete.  Try it with  2x the number of cores, 1x the number of 
cores, 1/2 x the number of cores and the number of cores + 1 and the number of 
cores - 1 to see if there are any substantial differences in time to completion.

But as the others have said, GCD and NSOperationQueue were made for these 
things.  If you’re actually trying to get work done and not researching for 
your own benefit, those are the ones to use.  They do all the heavy lifting for 
you are are easy to implement.

Cheers.
Alex Zavatone

> On Aug 16, 2022, at 2:37 PM, Jack Brindle via Cocoa-dev 
>  wrote:
> 
> Instead of using NSOperationQueue, I would use GCD to handle the tasks. 
> Create a new Concurrent queue 
> (dispatch_queue_create(DISPATCH_QUEUE_CONCURRENT)), then enqueue the 
> individual items to the queue for processing (dispatch_async(), using the 
> queue created above). Everything can be handled in blocks, including the 
> completion routines. As Christian says the problem then is that data may not 
> be in the original order so you will probably want to sort the returned 
> objects when done. This should significantly speed up the time to do the 
> whole task.
> 
> Jack
> 
> 
>> On Aug 16, 2022, at 12:26 PM, Steve Christensen via Cocoa-dev 
>>  wrote:
>> 
>> You mentioned creating and managing threads on your own, but that’s what 
>> NSOperationQueue —and the lower-level DispatchQueue— does. It also will be 
>> more efficient with thread management since it has an intimate understanding 
>> of the capabilities of the processor, etc., and will work to do the “right 
>> thing” on a per-device basis.
>> 
>> By leveraging NSOperationQueue and then keeping each of the queue operations 
>> focused on a single file then you’re not complicating the management of what 
>> to do next since most of that is handled for you. Let NSManagedObjectQueue 
>> do the heavy lifting (scheduling work) and focus on your part of the task 
>> (performing the work).
>> 
>> Steve
>> 
>>> On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann  wrote:
>>> 
>>> That is a good idea.  Thanks a lot!
>>> 
>>> Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
>>> latency hiding), as follows:
>>> create a number of threads (workers);
>>> as soon as a worker is finished with their "current" image, it gets the 
>>> next one (a piece of work) out of the list, processes it, and stores the 
>>> iso_date in the output array (dates_and_times).
>>> Both accesses to the pointer to the currently next piece of work, and the 
>>> output array would need to be made exclusive, of course.
>>> 
>>> Best regards, Gabriel
>> 
>> ___
>> 
>> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
>> 
>> Please do not post admin requests or moderator comments to the list.
>> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>> 
>> Help/Unsubscribe/Update your Subscription:
>> https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com
>> 
>> This email sent to jackbrin...@me.com
> 
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/zav%40mac.com
> 
> This email sent to z...@mac.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Jack Brindle via Cocoa-dev
Instead of using NSOperationQueue, I would use GCD to handle the tasks. Create 
a new Concurrent queue (dispatch_queue_create(DISPATCH_QUEUE_CONCURRENT)), then 
enqueue the individual items to the queue for processing (dispatch_async(), 
using the queue created above). Everything can be handled in blocks, including 
the completion routines. As Christian says the problem then is that data may 
not be in the original order so you will probably want to sort the returned 
objects when done. This should significantly speed up the time to do the whole 
task.

Jack


> On Aug 16, 2022, at 12:26 PM, Steve Christensen via Cocoa-dev 
>  wrote:
> 
> You mentioned creating and managing threads on your own, but that’s what 
> NSOperationQueue —and the lower-level DispatchQueue— does. It also will be 
> more efficient with thread management since it has an intimate understanding 
> of the capabilities of the processor, etc., and will work to do the “right 
> thing” on a per-device basis.
> 
> By leveraging NSOperationQueue and then keeping each of the queue operations 
> focused on a single file then you’re not complicating the management of what 
> to do next since most of that is handled for you. Let NSManagedObjectQueue do 
> the heavy lifting (scheduling work) and focus on your part of the task 
> (performing the work).
> 
> Steve
> 
>> On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann  wrote:
>> 
>> That is a good idea.  Thanks a lot!
>> 
>> Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
>> latency hiding), as follows:
>> create a number of threads (workers);
>> as soon as a worker is finished with their "current" image, it gets the next 
>> one (a piece of work) out of the list, processes it, and stores the iso_date 
>> in the output array (dates_and_times).
>> Both accesses to the pointer to the currently next piece of work, and the 
>> output array would need to be made exclusive, of course.
>> 
>> Best regards, Gabriel
> 
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/jackbrindle%40me.com
> 
> This email sent to jackbrin...@me.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Steve Christensen via Cocoa-dev
You mentioned creating and managing threads on your own, but that’s what 
NSOperationQueue —and the lower-level DispatchQueue— does. It also will be more 
efficient with thread management since it has an intimate understanding of the 
capabilities of the processor, etc., and will work to do the “right thing” on a 
per-device basis.

By leveraging NSOperationQueue and then keeping each of the queue operations 
focused on a single file then you’re not complicating the management of what to 
do next since most of that is handled for you. Let NSManagedObjectQueue do the 
heavy lifting (scheduling work) and focus on your part of the task (performing 
the work).

Steve

> On Aug 16, 2022, at 8:41 AM, Gabriel Zachmann  wrote:
> 
> That is a good idea.  Thanks a lot!
> 
> Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
> latency hiding), as follows:
> create a number of threads (workers);
> as soon as a worker is finished with their "current" image, it gets the next 
> one (a piece of work) out of the list, processes it, and stores the iso_date 
> in the output array (dates_and_times).
> Both accesses to the pointer to the currently next piece of work, and the 
> output array would need to be made exclusive, of course.
> 
> Best regards, Gabriel

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Gabriel Zachmann via Cocoa-dev
That is a good idea.  Thanks a lot!

Maybe, I can turn this into more fine-grained, dynamic load balancing (or 
latency hiding), as follows:
create a number of threads (workers);
as soon as a worker is finished with their "current" image, it gets the next 
one (a piece of work) out of the list, processes it, and stores the iso_date in 
the output array (dates_and_times).
Both accesses to the pointer to the currently next piece of work, and the 
output array would need to be made exclusive, of course.

Best regards, Gabriel




smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-16 Thread Steve Christensen via Cocoa-dev
One way to speed it up is to do as much work as possible in parallel. One way 
—and this is just off the top of my head— is:

1. Create a NSOperationQueue, and add a single operation on that queue to 
manage the entire process. (This is because some parts of the process are 
synchronous and might take a while and you don’t want to block the UI thread.)

2. The operation would create another worker NSOperationQueue where operations 
are added that each process a single image file (the contents of your `for` 
loop).

3. The manager operation adds operations to the worker queue to process a 
reasonable chunk of the files (10? 50?) and then waits for those operations to 
complete. (NSOperationQueue has something like a “wait until done” method.) It 
then repeats until all the image files have been processed.

4. As each chunk completes, it can report status to the UI thread via a 
notification or some other means.

Unlike your synchronous implementation, below, the order of updates to that 
array is indeterminate. A way to fix it is to pre-populate it with as many 
placeholder items (NSDate.distantPast?) as are in imagefiles and then store 
iso_date at the same index as its corresponding filename. Another benefit is 
that there is a single memory allocation at the beginning rather than periodic 
resizes of the array (and copying the existing contents) as items are added.

And since all these items are running on different threads then you need to 
protect access to your dates_and_times array because modifying it is not 
thread-safe. One quick way is to create a NSLock and lock it around the array 
update:

[theLock lock];
dates_and_times[index] = iso_date;
[theLock unlock];

Anyway, another way to look at the process.

Steve


> On Aug 14, 2022, at 2:22 PM, Gabriel Zachmann via Cocoa-dev 
>  wrote:
> 
> I would like to collect the date/time stored in an EXIF tag in a bunch of 
> images.
> 
> I thought I could do so with the following procedure
> (some details and error checking omitted for sake of clarity):
> 
> 
>NSMutableArray * dates_and_times = [NSMutableArray arrayWithCapacity: 
> [imagefiles count]];
>CFDictionaryRef exif_dict;
>CFStringRef dateref = NULL;
>for ( NSString* filename in imagefiles )
>{
>NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
> // escapes any chars that are not allowed in URLs (space, &, etc.)
>CGImageSourceRef image = CGImageSourceCreateWithURL( (__bridge 
> CFURLRef) imgurl, NULL );
>CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
> 0, NULL );
>bool success = CFDictionaryGetValueIfPresent( fileProps, 
> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>success = CFDictionaryGetValueIfPresent( exif_dict, 
> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>NSString * date_str = [[NSString alloc] initWithString: (__bridge 
> NSString * _Nonnull)( dateref ) ];
>NSDate * iso_date = [isoDateFormatter_ dateFromString: date_str];
>if ( iso_date )
> [dates_and_times addObject: iso_date ];
>CFRelease( fileProps );
>}
> 
> 
> But, I get the impression, this code actually loads each and every image.
> On my Macbook, it takes 3m30s for 250k images (130GB).
> 
> So, the big question is: can it be done faster?
> 
> I know the EXIF tags are part of the image file, but I was hoping it might be 
> possible to load only those EXIF dictionaries.
> Or are the CGImage functions above already clever enough to implement this 
> idea?
> 
> 
> Best regards, Gab.

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-15 Thread Gary L. Wade via Cocoa-dev
You should do any whitespace trimming first and be sure your date formatter is 
set correctly as any deviation will almost always fail.
--
Gary

> On Aug 15, 2022, at 2:51 AM, Gabriel Zachmann  wrote:
> 
> A detail I left out in-between: whitespace trimming.
> 
> Or, does anyone know if dateFromString will handle trailing/leading 
> whitespaces gracefully?

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-15 Thread Gabriel Zachmann via Cocoa-dev
> I noticed you release the fileProps but didn’t release the image, but I don’t 
> know if that’s one of those details you left out for clarity.

Good point, and sorry for the confusion.
Yes, I do release the image.

> Also, depending on some factors like mutability, while the initWithString 
> call with a CFStringRef might essentially be a no-op, you can just do the 
> typecast on the dateref and pass it directly into dateFromString.

Thanks, that should work.

A detail I left out in-between: whitespace trimming.

Or, does anyone know if dateFromString will handle trailing/leading whitespaces 
gracefully?

>
> One thing I’d suggest is to do the work for each image asynchronously on a 
> background queue and have that block (essentially all of your for-loop code) 
> report its completion by some asynchronous way like posting a notification on 
> the original queue

Yes, I will try to do that, in particular, to let the user know about the 
progress.


G.



smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-15 Thread Gabriel Zachmann via Cocoa-dev
> I think an important question here is, what exactly are you trying to do? Do 
> you

Good point, I should have mentioned it:

I would like to sort the list of images by their EXIF date/time.

> really expect to process these images repeatedly? 3.5m for a quarter million 
> pics isn't terrible when one already possesses a working solution.

True.
But I was wondering if it is really the best solution.


Best regards, Gabriel




smime.p7s
Description: S/MIME cryptographic signature
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-14 Thread Allyn Bauer via Cocoa-dev
Sorry if there's dupe emails.

I think an important question here is, what exactly are you trying to do?
Do you really expect to process these images repeatedly? 3.5m for a quarter
million pics isn't terrible when one already possesses a working solution.
If you have a specific task in mind, it would be helpful to know what level
of performance would be acceptable.

On Sun, Aug 14, 2022 at 8:51 PM Gary L. Wade via Cocoa-dev <
cocoa-dev@lists.apple.com> wrote:

> I noticed you release the fileProps but didn’t release the image, but I
> don’t know if that’s one of those details you left out for clarity.  Also,
> depending on some factors like mutability, while the initWithString call
> with a CFStringRef might essentially be a no-op, you can just do the
> typecast on the dateref and pass it directly into dateFromString.
>
> One thing I’d suggest is to do the work for each image asynchronously on a
> background queue and have that block (essentially all of your for-loop
> code) report its completion by some asynchronous way like posting a
> notification on the original queue along with the result you care about,
> the parsed date associated with the particular file.  Let the original
> queue handle how to store each parsed date; it would probably be best to
> use a dictionary where the key was the filename and value is the date.  To
> prevent memory pressure, allocate your background queue so that it’s
> concurrent and autorelease frequency is set to be workItem.  If you want to
> be sure to know when everything’s done, you could use a DispatchGroup to
> track those and you could choose to pass back NSNull or nil for the parsed
> result if the date could not be parsed.
>
> Of course, this will depend on if your file system is non-network-based
> and whether it’s SSD vs HD as well as other physical system factors.
> --
> Gary
>
> > On Aug 14, 2022, at 2:22 PM, Gabriel Zachmann via Cocoa-dev <
> cocoa-dev@lists.apple.com> wrote:
> >
> > I would like to collect the date/time stored in an EXIF tag in a bunch
> of images.
> >
> > I thought I could do so with the following procedure
> > (some details and error checking omitted for sake of clarity):
> >
> >
> >NSMutableArray * dates_and_times = [NSMutableArray arrayWithCapacity:
> [imagefiles count]];
> >CFDictionaryRef exif_dict;
> >CFStringRef dateref = NULL;
> >for ( NSString* filename in imagefiles )
> >{
> >NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory:
> NO];// escapes any chars that are not allowed in URLs (space, &, etc.)
> >CGImageSourceRef image = CGImageSourceCreateWithURL( (__bridge
> CFURLRef) imgurl, NULL );
> >CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex(
> image, 0, NULL );
> >bool success = CFDictionaryGetValueIfPresent( fileProps,
> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
> >success = CFDictionaryGetValueIfPresent( exif_dict,
> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
> >NSString * date_str = [[NSString alloc] initWithString: (__bridge
> NSString * _Nonnull)( dateref ) ];
> >NSDate * iso_date = [isoDateFormatter_ dateFromString: date_str];
> >if ( iso_date )
> > [dates_and_times addObject: iso_date ];
> >CFRelease( fileProps );
> >}
> >
> >
> > But, I get the impression, this code actually loads each and every image.
> > On my Macbook, it takes 3m30s for 250k images (130GB).
> >
> > So, the big question is: can it be done faster?
> >
> > I know the EXIF tags are part of the image file, but I was hoping it
> might be possible to load only those EXIF dictionaries.
> > Or are the CGImage functions above already clever enough to implement
> this idea?
> >
> >
> > Best regards, Gab.
> >
>
> ___
>
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/allyn.bauer%40gmail.com
>
> This email sent to allyn.ba...@gmail.com
>
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Retrieving the EXIF date/time from 250k images

2022-08-14 Thread Gary L. Wade via Cocoa-dev
I noticed you release the fileProps but didn’t release the image, but I don’t 
know if that’s one of those details you left out for clarity.  Also, depending 
on some factors like mutability, while the initWithString call with a 
CFStringRef might essentially be a no-op, you can just do the typecast on the 
dateref and pass it directly into dateFromString.

One thing I’d suggest is to do the work for each image asynchronously on a 
background queue and have that block (essentially all of your for-loop code) 
report its completion by some asynchronous way like posting a notification on 
the original queue along with the result you care about, the parsed date 
associated with the particular file.  Let the original queue handle how to 
store each parsed date; it would probably be best to use a dictionary where the 
key was the filename and value is the date.  To prevent memory pressure, 
allocate your background queue so that it’s concurrent and autorelease 
frequency is set to be workItem.  If you want to be sure to know when 
everything’s done, you could use a DispatchGroup to track those and you could 
choose to pass back NSNull or nil for the parsed result if the date could not 
be parsed.

Of course, this will depend on if your file system is non-network-based and 
whether it’s SSD vs HD as well as other physical system factors.
--
Gary

> On Aug 14, 2022, at 2:22 PM, Gabriel Zachmann via Cocoa-dev 
>  wrote:
> 
> I would like to collect the date/time stored in an EXIF tag in a bunch of 
> images.
> 
> I thought I could do so with the following procedure
> (some details and error checking omitted for sake of clarity):
> 
> 
>NSMutableArray * dates_and_times = [NSMutableArray arrayWithCapacity: 
> [imagefiles count]];
>CFDictionaryRef exif_dict;
>CFStringRef dateref = NULL;
>for ( NSString* filename in imagefiles )
>{
>NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
> // escapes any chars that are not allowed in URLs (space, &, etc.)
>CGImageSourceRef image = CGImageSourceCreateWithURL( (__bridge 
> CFURLRef) imgurl, NULL );
>CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image, 
> 0, NULL );
>bool success = CFDictionaryGetValueIfPresent( fileProps, 
> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>success = CFDictionaryGetValueIfPresent( exif_dict, 
> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>NSString * date_str = [[NSString alloc] initWithString: (__bridge 
> NSString * _Nonnull)( dateref ) ];
>NSDate * iso_date = [isoDateFormatter_ dateFromString: date_str];
>if ( iso_date )
> [dates_and_times addObject: iso_date ];
>CFRelease( fileProps );
>}
> 
> 
> But, I get the impression, this code actually loads each and every image.
> On my Macbook, it takes 3m30s for 250k images (130GB).
> 
> So, the big question is: can it be done faster?
> 
> I know the EXIF tags are part of the image file, but I was hoping it might be 
> possible to load only those EXIF dictionaries.
> Or are the CGImage functions above already clever enough to implement this 
> idea?
> 
> 
> Best regards, Gab.
> 

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com