Re: [Valgrind-users] Question about Valgrind tool in Intel new platform
On 4/20/22 05:18, Yang Zhong wrote: The AMX is the NEW feature in Intel new platform and from host, we can find below cpu flags: amx_bf16, amx_tile, amx_int8 The SPEC can be found in: https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf The issue I mentioned should be related with AMX features missed in valgrind emulated CPU. If someone will implement this feature on valgrind, I can help verify. Thanks! If you really want to help, then start today by collecting and/or writing actual code that emulates the hardware that implements the feature. Collect (or find, or write) the code from Chapter 3, "INTELĀ® AMX INSTRUCTION SET REFERENCE, A-Z", of that .pdf. Create actual subroutines and data declarations, and *test* it against your apps. Put the code into a public repository such as GitHub. The top-level function should be something like unsigned char const *emulate_amx( // returns next instruction pointer unsigned char const *ip, // pointer to first byte of instruction stream unsigned long *general_registers[16], // hardware state unsigned long long *zmm_registers[16], // zmm (ymm, xmm) registers struct Xsave *xsave_area, // tile registers etc. ... } which if successful returns a pointer to the next instruction, else an error code which is the negative of a small positive integer. Such code will go a long way towards getting AMX supported by valgrind, because it will enable valgrind-developers to focus on implementing valgrind instead of on finding, de-ciphering, and mentally interpreting documentation. ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind tool in Intel new platform
On 20/04/2022 13:41, Tom Hughes via Valgrind-users wrote: Again until we know what "AMX features" are it's impossible to comment in any detail. So apparently AMX is this: https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions So not only is it new instructions, it is new two dimensional registers so it's likely to be a huge task to add support. I think we're still trying to get the AVX512 support merged so that might give you some idea of the timelines on this sort of change. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind tool in Intel new platform
On 20/04/2022 13:18, Yang Zhong wrote: On Wed, Apr 20, 2022 at 09:37:17AM +0100, Tom Hughes wrote: On 20/04/2022 09:01, Yang Zhong wrote: So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't read correct register value. thanks! When running under valgrind you are running on an emulated CPU not the real CPU and the results of cpuid will reflect the capabilities of that emulated CPU rather than the real CPU. Do the bits that you are trying to check reflect something (like new instructions) that valgrind will need to be concerned about? Thanks Tom for your quickly response! The AMX is the NEW feature in Intel new platform and from host, we can find below cpu flags: amx_bf16, amx_tile, amx_int8 That tells me nothing. The SPEC can be found in: https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf No I'm not going to spend my day digging through thousands of pages of the latest instruction set reference trying to figure out what exactly this feature is... The issue I mentioned should be related with AMX features missed in valgrind emulated CPU. If someone will implement this feature on valgrind, I can help verify. Thanks! Again until we know what "AMX features" are it's impossible to comment in any detail. If AMX features involved new instructions then yes it will definitely need somebody to do the work to add support for them. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind tool in Intel new platform
On Wed, Apr 20, 2022 at 09:37:17AM +0100, Tom Hughes wrote: > On 20/04/2022 09:01, Yang Zhong wrote: > > >So, from above issue in Intel new platform, the valgrind need do some > >enablings to be compatible > >with on new platform? Seems valgrind tool can't identify the real HW > >platform because cpuid can't > >read correct register value. thanks! > > When running under valgrind you are running on an emulated CPU not > the real CPU and the results of cpuid will reflect the capabilities > of that emulated CPU rather than the real CPU. > > Do the bits that you are trying to check reflect something (like new > instructions) that valgrind will need to be concerned about? > Thanks Tom for your quickly response! The AMX is the NEW feature in Intel new platform and from host, we can find below cpu flags: amx_bf16, amx_tile, amx_int8 The SPEC can be found in: https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf The issue I mentioned should be related with AMX features missed in valgrind emulated CPU. If someone will implement this feature on valgrind, I can help verify. Thanks! Yang > Tom > > -- > Tom Hughes (t...@compton.nu) > http://compton.nu/ ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind tool in Intel new platform
On 20/04/2022 09:01, Yang Zhong wrote: So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't read correct register value. thanks! When running under valgrind you are running on an emulated CPU not the real CPU and the results of cpuid will reflect the capabilities of that emulated CPU rather than the real CPU. Do the bits that you are trying to check reflect something (like new instructions) that valgrind will need to be concerned about? Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind tool implementation
On Mon, 2013-09-09 at 10:50 +0900, Chang-Jae Lee wrote: The first definition of variable c is at line 11. We then suppress the line 11, and subsequent use of variable c(at line 12) The above defines what to do speaking in terms of source lines, while valgrind works at binary level. There is not necessarily a one to one mapping between a source line and a (contiguous) range of instructions. Also what if the source code has more than one statement on one single line (either because that is how it was typed or due to pre-processors macros) ? At this stage, how to implement what you describe in a valgrind tool looks quite a challenge to me. Sorry to not be able to help on this. Philippe -- How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind tool implementation
On Fri, Sep 6, 2013 at 5:43 AM, Philippe Waroquiers philippe.waroqui...@skynet.be wrote: On Thu, 2013-09-05 at 16:01 +0900, Chang-Jae Lee wrote: Not too sure about what you mean with the above. Valgrind works at binary level, it does not really have a notion of statement. For example, if in the code you have: f() { char *ptr1; char *ptr2; these two statements will just be part of the stack setup (e.g. change the stack pointer) and so there is no way to remove the instruction corresponding to e.g. only the first ptr definition. As I do not understand the tool you have to write, I have no idea how to best do what you need. Here I brought an example of the execution suppression from the paper: - - Let x and y be pointers to two malloc'ed memory regions, each able to hold two integers. - Let intArray be a heap array of integers. - Let structArray be a heap array of pointers to structs with a f ield f. 1: int * p1 = x[1]; 2: int * p2 = x[0]; 3: int * q1 = y[1]; 4: int * q2 = x[0]; // copy-paste error: should be y[0] 5: *p1 = readInt(); 6: *p2 = readInt(); // gets clobbered at line 8 7: *q1 = readInt(); 8: *q2 = readInt(); // clobbers line 6 defi nition 9: int a = *p1 + *p2; // uses infected *p2/*q2 10: int b = *q1 + *q2; // uses infected *p2/*q2 11: int c = a + b + 1; // uses infected a and b 12: intArray[c] = 0; // potential buff er overfow 13: structArray[*p2]-f = 0; // potential NULL dereference 14: free(p2); 15: free(q2); // potential double free - The line 4 has a copy-paste error, corrupting a pointer q2. From there, several reads/writes exist on the corrupted location. Let's suppose that at first the crash occurs at line 12, because of reading illegal memory address. The first definition of variable c is at line 11. We then suppress the line 11, and subsequent use of variable c(at line 12) Now running this program results another crash at line 13. Use of p2/q2 results crash, so we suppress the definition of q2 at line 8, and subsequent use of q2 - line 9 and 10. Running this program again still have a crash, in line 15 because of double free. Suppressing the definition of q2 in line 4, and use of q2 in line 15, running again then no crash occur, So we can conclude that line 4 is the first location of memory corruption. And about uninitialized pointers like Philippe you mentioned, you're right. In that case it's not suppress-able. But if there's a code like this - -- void foo() { char* pt1; char* pt2; pt1[0] = 'X'; printf(%s\n, pt1); } -- The crash occurs at pt1[0] = 'X'; so we suppress this statement(since there's no value assignment of pt1, this is the only suppression point) and on the next run there would be no crash. Therefore we can conclude using uninitialized pointer is the root cause of the crash. -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391iu=/4140/ostg.clktrk___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind tool implementation
On Thu, 2013-09-05 at 16:01 +0900, Chang-Jae Lee wrote: Hi, I am a grad-student in KAIST, and I'm working on a project for finding bugs or errors. Currently I'm following a routine from the paper Execution Suppression: An Automated Iterative Technique for Locating Memory Errors. It is about finding the root cause of memory error(s) when a program shows a crash, by suppressing the code statement which defines that memory location and subsequent statements using the location and restart the program, until no crash happens. So what I need here is, - How can I handle target application's segmentation fault in my tool? First I ran my target with Lackey and it gets SIGSEGV, alerts to me, and returns 0, but the last thing it does is saying that it was terminated with segmentation fault. here I attached the log of Lackey. From what I can see, you will have to modify Valgrind core to let the tool intercept the guest signals. At first sight, your tool might install a fault_catcher using VG_(set_fault_catcher). However, currently, such a fault catcher can only run in non generated code (see sync_signalhandler_from_kernel). You might have to change that. Maybe some other changes will be needed (such as allowing the fault catcher to indiate that the signal is not to be propagated. You will find a current use of such a fault catcher in memcheck (mc_leakcheck.c), but however not in generated code. - I need to suppress instructions which stands for a single code statement, like defining pointers or accessing particular memory addresses. Looks like the core connects debug information if there is one. Then, how does the tool recognize it (like memcheck does)? Is VEX IR superblock contains about it? Not too sure about what you mean with the above. Valgrind works at binary level, it does not really have a notion of statement. For example, if in the code you have: f() { char *ptr1; char *ptr2; these two statements will just be part of the stack setup (e.g. change the stack pointer) and so there is no way to remove the instruction corresponding to e.g. only the first ptr definition. As I do not understand the tool you have to write, I have no idea how to best do what you need. Philippe -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391iu=/4140/ostg.clktrk ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind
On 3/23/2012 9:10 AM, Hamid Reza Khaleghzadeh wrote: Hi, I have some questions about Valgrind tool. I would be thankful if answer me. 1- I want to know does Valgrind support Pthread and openMP programs? 2- I need a tool that traces my multi-threaded program and creates memory trace of the program. The trace must contain accessed memory address, threadID and access type(Read/Write). Sorry to bother you with my questions. It does support multiple threads, but runs only one thread at a time. I don't personally know the answer to the OpenMP question. For a memory trace, you can take and extend the lackey tool slightly to have it output the thread id in addition to the type of reference, the address, and the number of bytes referenced. That should be a straightforward change if you can find where to get the thread id in valgrind. (I've not looked for that information, but system call tracing in valgrind certainly prints it, so it should not be hard to get.) Regards -- Eliot Moss -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind
On 3/23/2012 12:21 PM, Hamid Reza Khaleghzadeh wrote: Hi Dear Eliot, I added valgrind-users back so that all can follow the email thread. Thanks for your answer. You said that Valgrind does support multiple threads, but runs only one thread at a time. Suppose a two threaded application. The threads run in parallel. Suppose thread1 read A, thread2 simultaneously write A, and thread1 read A again. In this case, records of output trace are as following (holds order of access) ? Thread1 read A Thread2 write A Thread1 read A It depends on whether the scheduler swaps threads at the two places indicated. Perhaps it will help to think of the trace as being what you might get from running your multi-thread program on a single cpu, where the operating system periodically switches between runnable threads. In fact, valgrind will try to switch when the underlying OS does. However, for various reasons valgrind tends to emulate the execution of whole sequences of instructions in one go, and such thread switching will not happen in the middle of a sequence. Thus if Thread1 reads A twice in the *same* such sequence, you will never see Thread2 do anything between the two reads. You may be able to control with your instrumentation where the sequences end, but performance will be very bad if you end them at every read and write. Also, you still get each thread normally running for a full scheduler quantum before a switch will be attempted. In sum, you should get one *possible* execution, but it won't necessarily be one typical of truly concurrent execution. (Just because you have threads that are *runnable* at the same time does not mean that they actually run *concurrently* in a real system, unless you guarantee multuple cpus are available. You can think of valgrind as providing a universe where only one cpu is *ever* available.) There are many good reasons why valgrind is this way; it would be substantially more difficult to build (and maintain!) a similar tool that supported true concurrency. Thanks for your answer in advance. No problem. Eliot Moss -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Question about Valgrind
In sum, you should get one *possible* execution, but it won't necessarily be one typical of truly concurrent execution. (Just because you have threads that are *runnable* at the same time does not mean that they actually run *concurrently* in a real system, unless you guarantee multuple cpus are available. You can think of valgrind as providing a universe where only one cpu is *ever* available.) You can get round robin scheduling of runnable threads using --fair-sched=yes. And you can change the quantum to something a lot lower by changing this value ./coregrind/m_scheduler/scheduler.c:#define SCHEDULING_QUANTUM 10 to (eg) 1000. Performance will be a lot worse, but you'll get much finer grained interleaving. Note that when https://bugs.kde.org/show_bug.cgi?id=296422 lands it will somewhat reduce this effect, since one minor effect of it is to reduce the number of event (timeslice-out) checks by about a factor of 3. J -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users